xMOLECULES FOR DISEASE DETECTION AND TREATMENT 



TECHNICAL FIELD 

The present invention relates to molecules for disease detection and treatment and to the use 
of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with 
disease detection and treatment molecules. 

BACKGROUND OF THE INVENTION 

The human genome is comprised of thousands of genes, many encoding gene products that 
function in the maintenance and growth of the various cells and tissues in the body, Abenrant 
expression or mutations in these genes and their products is the cause of. or is associated with, a 
variety of human diseases such as cancer and other cell proliferative disorders. The identification of 
these genes and their products is the basis of an ever-expanding effon to find markers for eariy 
detection of diseases, and targets for their prevention and treatment. 

For example, cancer represents a type of cell proliferative disorder that affects nearly every 
tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the 
cause of, or involved with, various cancers because tissue growth involves complex and ordered 
patterns of ceil proliferation, cell differentiation, and apoptosis. Cell proliferation must be regulated 
to maintain both the number of cells and their spatial organization. This regulation depends upon the 
appropriate expression of proteins which control cell cycle progression in response to extracellular 
signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or 
nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fall into 
several categories, including growth factors and their receptors, second messenger and signal 
transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promoting factors. 
Aberrant expression or mutations in any of these gene products can resuh in cell proliferative 
disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through 
abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one 
(oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways 
and include growth factors, growth factor receptors, intracellular signal transducers, nuclear 
transcription factors, and cell-cycle control proteins, in contrast, tumor-suppressor genes are 
involved in inhibiting cell proliferation. Mutations which cause reduced or loss of function in 
tumor-suppressor genes result in aberrant cell proliferation and cancer. Thus a wide variety of genes 
and their products have been found that are associated with cell proliferative disorders such as cancer, 
but many more may exist that are yet to be discovered. 

DNA-based arrays can provide a simple way to explore the expression of a single 
polymorphic gene or a large number of genes. When the expression of a single gene is explored, 
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DNA-based arrays are employed to detect the expression of specific gene variants. For example, a 
p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that 
predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals 
have one of a number of specific mutations that could result in increased drug metabolism, drug 
resistance or drug toxicity. 

DNA-based array technology is especially relevant for the rapid screening of expression of a 
large number of genes. There is a growing awareness that gene expression is affected in a global 
fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, 
the expression of a large number of genes. In some cases the interactions may be expected, such as 
when the genes are part of the same signaling pathway. In other cases, such as when the genes 
participate in separate signaling pathways, the interactions may be totally unexpected. Therefore. 
DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic 
treatment affects the expression of a large number of genes. 

The discovery of new molecules for disease detection and treatment satisfies a need in the art 
by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of 
diseases. 



SUMMARY OF THE INVENTION 

The present invention relates to human polynucleotides encoding molecules for disease 
detection and treatment (mddt) as presented in the Sequence Listing. Some of the mddt uniquely 
identify genes encoding stmctural, functional, and regulatory molecules for disease detection and 
treatment. 

The invention provides an isolated polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 
of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-1 4. In another 
alternative, the polynucleotide comprises at least 60 contiguous nucleotides of a polynucleotide 
sequence selected from the group consisting of a) a polynucleotide sequence selected from the group 
consisting of SEQ ID NO:l-14; b) a naturally occurring polynucleotide sequence having at least 90% 
sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO:l- 
14; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary 
to b); and e) an RNA equivalent of a) through d). The invention further provides a composition for 
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the detection of expression of disease detection and treatment moiecuie polynucieotides compnsine at 
least one isolated polynucleotide comprising a polynucieotide sequence selected from the eroup 
consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1- 14: b) 
a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucieotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d); and a detectable label. 

The invention also provides a method for detecting a target polynucieotide in a sample, said 
target polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a 
polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; b) a naturally 
occurring polynucleotide sequence having at least 90% sequence idennty to a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence 
complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA equivalent 
of a) through d). The method comprises a) hybridizing the sample with a probe comprising at least 20 
contiguous nucleotides comprising a sequence complementary to said target polynucieotide in the 
sample, and v^hich probe specifically hybridizes to said target polynucleotide, under conditions 
whereby a hybridization complex is formed between said probe and said target polynucieotide, and b) 
detecting the presence or absence of said hybridization complex, and, optionally, if present, the 
amount thereof. In one alternative, the probe comprises at least 30 contiguous nucleotides. In 
another alternative, the probe comprises at least 60 contiguous nucleotides. 

The invention further provides a recombinant polynucleotide comprising a promoter sequence 
operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1 - 
14; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
polynucieotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d). In one alternative, the invention provides a cell transformed with the 
recombinant polynucleotide. In another alternative, the invention provides a transgenic organism 
comprising the recombinant polynucleotide. In a further alternative, the invention provides a metliod 
for producing a disease detection and treatment molecule polypeptide, the method comprising a) 
cuituring a cell under conditions suitable for expression of the disease detection and treatment 
moiecuie polypeptide, wherein said cell is transformed with the recombinant polynucleotide, and b) 
recovering the disease detection and treatment moiecuie polypeptide so expressed. 

The invention also provides a purified disease detection and treatment molecule polypeptide 
(MDDT) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from 



wo 00/75298 



PCT/USOO/15344 



the group consisting of SEQ ID NO: 1-14. Additionally, the invention provides an isolated anubody 
which specifically binds to the disease detection and treatment molecule polypeptide. The invention 
further provides a method of identifying a test compound which specifically binds to the disease 
detection and treatment molecule polypeptide, the method comprising the steps of a) providins a test 
5 compound: b) combining the disease detection and treatment molecule polypeptide with the test 

compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of 
the disease detection and treatment molecule polypeptide to the test compound, thereby identifying 
the test compound which specifically binds the disease detection and treatment molecule polypeptide. 
The invention further provides a microarray wherein at least one element of the microarray is 
10 an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide 
comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring 
polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence complementary to a); d) 
15 a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
invention also provides a method of using the microarray for generating a transcnpt image of a 
sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the 
sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample 
under conditions suitable for the formation of a hybridization complex, and c) quantifying the 
20 expression of the polynucleotides in the sample. 

Additionally, the invention provides a method for screening a compound for effectiveness in 
altering expression of a target polynucleotide, wherein said target polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence 
25 having at least 90% sequence identity to a polynucleotide sequence selected from the group 
consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence complementary to a); d) a 
polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
method comprises a) exposing a sample comprising the target polynucleotide to a compound, and b) 
detecting altered expression of the target polynucleotide. 
3 0 The invention further provides a method for detecting a target polynucleotide in a sample for 

toxicity testing of a compound, said target polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 
of SEQ ID NO: 1 - 1 4; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
!5 polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
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and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a 
probe comprising at least 20 contiguous nucleotides comprising a sequence complementar>' to said 
target polynucleotide in the sample, and which probe specifically hybridizes to said target 
polynucleotide, under conditions whereby a hybridization complex is formed between said probe and 
said target polynucleotide, b) detecting the presence or absence of said hybridization complex, and. 
optionally, if present, the amount thereof, and c) comparing the presence, absence or amount of said 
target polynucleotide in a first biological sample and a second biological sample, wherein said first 
biological sample has been contacted with said compound, and said second sample is a control, 
whereby a change in presence, absence or amount of said target polynucleotide in said first sample, as 
compared with said second sample, is indicative of toxic response to said compound. 

DESCRIPTION OF THE TABLES 

Table 1 shows the sequence identification numbers (SEQ ID NO.s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with their GenBank hits (GI Numbers), probability scores, and functional annotations 
corresponding to the GenBank hits. 

Table 2 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with polynucleotide segments of each template sequence as defined by the indicated "start" and 
"stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, 
Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the 
polynucleotide segments are indicated. 

Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with polynucleotide segments of each template sequence as defined by the indicated "start" and 
"stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the 
polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or 
transmembrane (TM) domains, as indicated. 

Table 4 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with component sequence identification numbers (component IDs) corresponding to each 
template. The component sequences, which were used to assemble the template sequences, are 
defined by the indicated "start" and "stop" nucleotide positions along each template. 

Table 5 summarizes the bioinformatics tools which are useful for analysis of the 
polynucleotides of the present invention. The first column of Table 5 lists analytical tools, programs. 
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and algorithms, the second column provides brief descriptions thereof, the third column presents 
appropriate references, all of which are incorporated by reference herein in their entirety, and the 
founh column presents, where applicable, the scores, probability values, and other parameters used to 
evaluate the strength of a match between two sequences (the higher the score, the greater the 
homology between two sequences). 



DETAILED DESCRIPTION OF THE INVENTION 
Before the nucleic acid sequences and methods are presented, it is to be understood that this 
invention is not limited to, the particular machines, methods, and materials described. Although 
panicular embodiments are described, machines, methods, and materials similar or equivalent to these 
embodiments may be used to practice the invention. The preferred machines, methods, and materials 
set forth are not intended to limit the scope of the invention which is limited only by the appended 
claims. 

The singular forms "a", *'an'\ and *lhe" include plural reference unless the context clearly 
dictates otherwise. All technical and scientific terms have the meanings commonly understood by 
one of ordinary skill in the art. All publications are incorporated by reference for the purpose of 
describing and disclosing the cell lines, vectors, and methodologies which are presented and which 
might be used in connection with the invention. Nothing in the specification is to be construed as an 
admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. 

Definitions 

As used herein, the lower case '"mddt" refers to a nucleic acid sequence, while the uppercase 
"MDDT' refers to an amino acid sequence encoded by mddt. A "full-length" mddt refers to a nucleic 
acid sequence containing the entire coding region of a gene endogenously expressed in human tissue. 

"Adjuvants" are materials such as Freund's adjuvant, mineral gels (aluminum hydroxide), and 
surface active substances (lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole 
limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's 
immunological response. 

"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a 
"mutation," a change or an alternative reading of the genetic code. Any given gene may have none, 
one, or many allelic fonns. Mutations which give rise to alleles include deletions, additions, or 
substitutions of nucleotides. Each of these changes may occur alone, or in combination with the 
others, one or more times in a given nucleic acid sequence. The present invention encompasses 
allelic mddt. 

"Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or 
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synthetic origin. The amino acid sequence is not limited to the complete, endogenous amino acid 
sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic 
acid sequence. 

"Amplification" refers to the production of additional copies of a sequence and is carried out 
5 using polymerase chain reaction (PCR) technologies well known in the art. 

Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab')-,, 
and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind 
MDDT polypeptides can be prepared using intact polypeptides or using fragments containing small 
peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an 
10 animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA» or synthesized 
chemically, and can be conjugated to a carrier protein if desired. Commonly used carriers that are 
chemically coupled to peptides include bovine semm albumin, thyroglobulin, and keyhole limpet 
hemocyanin (KLH). The coupled peptide is then used to immunize the animal. 

"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 
15 sequence. The antisense sequence may include DNA, RNA, or any nucleic acid mimic or analog such 
as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as 
phosphorothioates, methyl phosphonates, or benzylphosphonates; oligonucleotides having modified 
sugar groups such as 2 -meihoxyethyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having 
modified bases such as 5-methyl cytosine. 2 -deoxyuraciK or 7-deaza-2'-deoxyguanosine. 
20 "Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 

sequence. The antisense sequence can be DNA, RNA. or any nucleic acid mimic or analog. 

"Antisense technology" refers to any technology which relies on the specific hybridization of 
an antisense sequence to a target sequence. 

A "bin" is a portion of computer memory space used by a computer program for storage of 
25 data, and bounded in such a manner that data stored in a bin may be retrieved by the program. 

"Biologically active" refers to an amino acid sequence having a stmctural, regulatory, or 
biochemical function of a naturally occurring amino acid sequence. 

"Clone joining" is a process for combining gene bins based upon the bins' containing 
sequence information from the same clone. The sequences may assemble into a primary gene 
30 transcript as well as one or more splice variants. 

"Complementary" describes the relationship between two single-stranded nucleic acid 
sequences that anneal by base-pairing (5'-A-G-T-3' pairs with its complement 3'-T-C-A-50. 

A "component sequence" is a nucleic acid sequence selected by a computer program such as 
PHRED and used to assemble a consensus or template sequence from one or more component 
35 sequences. 
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A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been 
assembled from overlapping sequences, using a computer program for fragment assembly such as the 
GEL VIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a 
relational database management system (RDMS). 
5 "Conservative amino acid substitutions" are those substitutions that, when made, least 

interfere with the properties of the original protein, i.e.. the structure and especially the function of 
the protein is conserved and not significantly changed by such substitutions. The table below shows 
amino acids which may be substituted for an original amino acid in a protein and which are regarded 
as conservative substitutions. 

10 





Oriffinal Residue 


Conservative Substitution 




Ala 


Gly, Ser 




Arg 


His, Lys 




Asn 


Asp, Gin, His 


15 


Asp 


Asn, Glu 




Cys 


Ala, Ser 




Gin 


Asn, Glu, His 




Glu 


Asp, Gin, His 




Gly 


Ala 


20 


His 


Asn, Arg, Gin, Glu 




He 


Leu, VaT 




Leu 


He, Val 




Lys 


Arg, Gin, Glu 




Met 


Leu, He 


25 


Phe 


His, Met, Leu, Trp, Tyr 




Ser 


Cys, Thr 




Thr 


Ser, Val 




Trp 


Phe, Tyr 




Tyr 


His, Phe. Trp 


30 


Val 


He. Leu. Thr 



Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in 
the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge 
3 5 or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. 

"Deletion'' refers to a change in either a nucleic or amino acid sequence in which at least one 
nucleotide or amino acid residue, respectively, is absent. 

"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by 
replacement of hydrogen by an alkyl, acyl, amino, hydroxyl, or other group. 
40 The terms "element" and "anray element" refer to a polynucleotide, polypeptide, or other 

chemical compound having a unique and defined position on a microarray. 
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**E-value" refers to the statistical probability that a match between two sequences occurred b 

chance. 

A "fragment" is a unique ponion of mddt or MDDT which is identical in sequence to but 
shoner in length than the parent sequence. A fragment may comprise up to the entire length of the 
defined sequence, minus one nucieotide/amino acid residue. For example, a fragment may comprise 
from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, primer, 
antigen, therapeutic molecule, or for other purposes, may be at least 5, 10, 15, 16, 20, 25, 30, 40, 50, 
60, 75, 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. 
Fragments may be preferentially selected from cenain regions of a molecule. For example, a 
polypeptide fragment may comprise a certain length of contiguous amino acids selected from the first 
250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a certain defined 
sequence. Cleariy these lengths are exemplary, and any length that is supported by the specification, 
including the Sequence Listing and the figures, may be encompassed by the present embodiments. 

A fragment of mddt comprises a region of unique polynucleotide sequence that specifically 
identifies mddt, for example, as distinct from any other sequence in the same genome, A fragment of 
mddt is useful, for example, in hybridization and amplification technologies and in analogous 
methods that distinguish mddt from related polynucleotide sequences. The precise length of a 
fragment of mddt and the region of mddt to which the fragment corresponds are routinely 
determinable by one of ordinary skill in the an based on the intended purpose for the fragment. 

A fragment of MDDT is encoded by a fragment of mddt. A fragment of MDDT comprises a 
region of unique amino acid sequence that specifically identifies MDDT. For example, a fragment of 
MDDT is useful as an immunogenic peptide for the development of antibodies that specifically 
recognize MDDT. The precise length of a fragment of MDDT and the region of MDDT to which the 
fragment corresponds are routinely determinable by one of ordinary skill in the art based on the 
intended purpose for the fragment, 

A "full length" nucleotide sequence is one contairmg at least a start site for translation to a 
protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" 
polypeptide. 

**Hit" refers to a sequence whose annotation will be used to describe a given template. 
Criteria for selecting the top hit are as follows: if the template has one or more exact nucleic acid 
matches, the top hit is the exact match with highest percent identity. If the template has no exact 
matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the 
template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit 
is the nucleotide hit with the lowest E-value. 
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"Homology" refers to sequence simiiarity either between a reference nucleic acid sequence 
and at least a fragment of an mddt or between a reference amino acid sequence and a fragment of an 
MDDT. 

"Hybridization" refers to the process by which a strand of nucleotides anneals with a 
complementary strand through base pairing. Specific hybridization is an indication that two nucleic 
acid sequences share a high degree of identity. Specific hybridization complexes form under defined 
annealing conditions, and remain hybridized after the ''washing" step. The defined hybridization 
conditions include the annealing conditions and the washing step(s), the latter of which is panicuiarly 
imponant in determining the stringency of the hybridization process, with more stringent conditions 
allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not 
perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely 
determinable and may be consistent among hybridization experiments, whereas wash conditions may 
be varied among experiments to achieve the desired stringency. 

Generally, stringency of hybridization is expressed with reference to the temperature under 
which the wash step is carried out. Generally, such wash temperatures are selected to be about S°C to 
20^C lower than the thermal melting point (T^) for the specific sequence at a defined ionic strength 
and pH. The T^ is the temperature (under defined ionic strength and pH) at which 50% of the target 
sequence hybridizes to a perfectly matched probe. An equation for calculating T„ and conditions for 
nucleic acid hybridization is well known and can be found in Sambrook et al., 1989, Molecular 
Cloning: A Laboratory MannaL 2"^ ed., vol. 1-3, Cold Spring Harbor Press, Plainview NY; 
specifically see volume 2, chapter 9. 

High stringency conditions for hybridization between polynucleotides of the present 
invention include wash conditions of 68*^C in the presence of about 0.2 x SSC and about 0.1% SDS, 
for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or55°C may be used. SSC 
concentration may be varied from about 0.2 to 2 x SSC, with SDS being present at about 0.1 
Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents 
include, for instance, denatured salmon spemi DNA at about IO0-20O ^g/ml. Useful variations on 
these conditions will be readily apparent to those skilled in the art. Hybridization, panicuiarly under 
high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. 
Such simiiarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. 

Other parameters, such as temperature, salt concentration, and detergent concentration may 
be varied to achieve the desired stringency. Denaturants, such as fonmamide at a concentration of 
about 35-50% v/v, may also be used under panicular circumstances, such as RNAiDNA 
hybridizanons. Appropriate hybridization conditions are routinely determinable by one of ordinary 
skill in the art. 
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'Immunogenic" describes the potential for a natural, recombinant, or synthetic peptide, 
epitope, polypeptide, or protein to induce antibody production in appropriate animals, cells, or ceil 
lines. 

'Insertion" or "addition" refers to a change in either a nucleic or amino acid sequence in 
which at least one nucleotide or residue, respectively, is added to the sequence. 

"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or 
antibody with a reporter molecule capable of producing a detectable or measurable signal. 

"Microarray" is any arrangement of nucleic acids, amino acids, antibodies, etc., on a 
substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or 
an appropriate membrane. 

"Linkers" are shon stretches of nucleotide sequence which may be added to a vector or an 
mddt to create restriction endonuclease sites to facilitate cloning. "Polylinkers" are engineered to 
incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 
3* overhangs (e.g., BamHI, EcoRI, and Hindlll) and those which provide blunt ends (e.g., EcoRV, 
SnaBI, and StuI). 

"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be 
isolated from viruses or prokaryotic or eukaryotic cells. 

"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester 
bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid 
sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be 
DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be 
either double-stranded or single-stranded, and can represent either the sense or antisense 
(complementary) strand. 

"Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as 
about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 
and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may 
be used as, e.g., primers for PGR, and are usually chemically synthesized. 

"Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a 
functional relationship with the second nucleic acid sequence. For instance, a promoter is operably 
linked to a coding sequence- if the promoter affects the transcription or expression of the coding 
sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, 
where necessary to join two protein coding regions, in the same reading frame. 

"Peptide nucleic acid" (PNA) refers to a DNA nnimic in which nucleotide bases are attached 
to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can 
prevent gene expression by targeting complementary messenger RNA. 
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The phrases "percent identity*' and identity", as applied to polynucleotide sequences, 
refer to the percentage of residue matches between at least two polynucleotide sequences aligned 
using a standardized algorithm. Such an algorithm may insert, in a standardized and reproducible 
way, gaps in the sequences being compared in order to optimize ahgnment between two sequences, 
and therefore achieve a more meaningful comparison of the two sequences. 

Percent identity between polynucleotide sequences may be determined using the default 
parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e 
sequence alignment program. This program is pan of the LASERGENE software package, a suite of 
molecular biological analysis programs (DNASTAR, Madison WI). CLUSTAL V is described in 
Higgins, D.G. and Sharp, P.M. (1989) CABIOS 5:151-153 and in Higgins, D.G. et al. (1992) 
CABIOS 8:189-191. For pairwise alignments of polynucleotide sequences, the default parameters are 
set as follows: Ktuple=2, gap penalty=5, windows:4, and "diagonals saved"=4. The "weighted" 
residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the 
"percent similarity" between aligned polynucleotide sequence pairs. 

Alternatively, a suite of commonly used and freely available sequence comparison algorithms 
is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment 
Search Tool (BLAST) CAltschul, S.F. et al. (1990) J. Mol. Biol. 215:403^10), which is available 
from several sources, including the NCBL Bethesda, MD, and on the internet at 
http://www,ncbi.nlm.nih.gov/BLAST/. The BLAST software suite includes various sequence 
analysis programs including "blastn," that is used to determine alignment between a known 
polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called 
"BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. 
"BLAST 2 Sequences" can be accessed and used interactively at 

http://www.ncbi.nlm.nih.gov/gorf/bl2/. The "BLAST 2 Sequences" tool can be used for both blastn 
and blastp (discussed below). BLAST programs are commonly used with gap and other parameters 
set to default settings. For example, to compare two nucleotide sequences, one may use blastn with 
the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default 
parameters may be, for example: 

Matrix: BLOSUM62 

Reward for match: J 

Penalty for mismatch: -2 

Open Gap: 5 and Extension Gap: 2 penalties 

Gap X drop-off: 50 

Expect: 10 

Word Size: J I 
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Filter: on 

Percent identity may be measured over the length of an entire defined sequence, for example, 
as defined by a panicular SEQ ID number, or may be measured over a shorter length, for example, 
over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at 
least 20. at least 30. at least 40, at least 50, at least 70, at least 100, or at least 200 contiguous 
nucleotides. Such lengths are exemplary only, and it is understood that any fragment length 
supported by the sequences shown herein, in figures or Sequence Listings, may be used to describe a 
length over which percentage identity may be measured. 

Nucleic acid sequences that do not show a high degree of identity may nevertheless encode 
similar amino acid sequences due to the degeneracy of the genetic code. It is understood that chanses 
in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid 
sequences that all encode substantially the same protein. 

The phrases "percent identity" and "7c identity", as applied to polypeptide sequences, refer to 
the percentage of residue matches between at least two polypeptide sequences aligned using a 
standardized algorithm. Methods of polypeptide sequence alignment are well-known. Some 
alignment methods take into account conservative amino acid substitutions. Such conservative 
substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the 
substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide. 

Percent identity between polypeptide sequences may be determined using the default 
parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e 
sequence alignment program (described and referenced above). For pairwise alignments of 
polypeptide sequences using CLUSTAL V. the default parameters are set as follows: Ktuple= 1 , gap 
penalty=3, window=:5, and "diagonals saved"=5. The PAM250 matrix is selected as the default 
residue weight table. As with polynucleotide alignments, the percent identity is reported by 
CLUSTAL V as the "percent sinnjiarity" between aligned polypeptide sequence pairs. 

Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise 
comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 
(May-07-1999) with blastp set at default parameters. Such default parameters may be, for example: 
Matrix: BLOSUM62 

Open Gap: 1 1 and Extension Gap: I penalty 
Gap X drop-off: 50 
Expect: 10 
Word Size: 3 
Filter: on 
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Percent identity may be measured over the length of an entire defined polypeptide sequence, 
for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for 
example, over the length of a fragment taken from a larger, defined polypeptide sequence, for 
instance, a fragment of at least 15, at least 20, at least 30, at least 40, at least 50, at least 70 or at least 
150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment 
length supported by the sequences shown herein, in figures or Sequence Listings, may be used to 
describe a length over which percentage identity may be measured. 

"Post-translational modification" of an MDDT may involve Jipidation, glycosylation, 
phosphorylation, acetylation, racemization, proteolytic cleavage, and other modifications known in 
the art. These processes may occur synthetically or biochemically. Biochemical modifications will 
vary by cell type depending on the enzymatic milieu and the MDDT. 

"Probe" refers to mddt or fragments thereof, which are used to detect identical, allelic or 
related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a 
detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, 
chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA 
oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. 
The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. 
Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g., by the 
polymerase chain reaction (PCR). 

Probes and primers as used in the present invention typically comprise at least 15 contiguous 
nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also 
be employed, such as probes and primers that comprise at least 20, 30, 40, 50, 60, 70, 80, 90, 100, or 
at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may 
be considerably longer than these examples, and it is understood that any length supponed by the 
specification, including the figures and Sequence Listing, may be used. 

Methods for preparing and using probes and primers are described in the references, for 
example Sambrook et al., 1989, Molecular Cloning: A Laboratory ManuaL 2"^ ed., voL 1-3, Cold 
Spring Harbor Press, Plainview NY; Ausubel ei al.,1987, Current Protocols in Molecular Biology . 
Greene Publ. Assoc. Sc Wiley-Intersciences, New York NY; Innis et al., 1990, PCR Protocols. A 
Guid e to Methods and Applications . Academic Press, San Diego CA, PCR primer pairs can be 
denved from a known sequence, for example, by using computer programs intended for that purpose 
such as Pnmer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA). 

Oligonucleotides for use as primers are selected using software known in the an for such 
purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 
100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 
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5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases. Similar primer 
selection programs have incorporated additional features for expanded capabilities. For example, the 
PrimOU primer selection program (available to the public from the Genome Center at University of 
Texas. South West Medical Center, Dallas TX) is capable of choosing specific primers from 
megabase sequences and is thus useful for designing primers on a genome-wide scope. The Primer3 
primer selection program (available to the public from the Whitehead Institute/MIT Center for 
Genome Research, Cambridge MA) allows the user to input a "mispriming librar>'/' in which 
sequences to avoid as primer binding sites are user-specified. Primer3 is useful, in particular, for the 
selection of oligonucleotides for microarrays. (The source code for the latter two primer selection 
programs may also be obtained from their respective sources and modified to meet the user's specific 
needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping 
Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, 
thereby allowing selection of primers that hybridize to either the most conserved or least conserved 
regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both 
unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and 
polynucleotide fragments identified by any of the above selection methods are useful in hybridization 
technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to 
identify fully or panially complementary polynucleotides in a sample of nucleic acids. Methods of 
oligonucleotide selection are not limited to those described above. 

''Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or 
separated from their natural environment and are at least 60% free, preferably at least 75% free, and 
most preferably at least 90% free from other compounds with which they are naturally associated. 

A "recombinant nucleic acid" is a sequence that is not naturally occurring or has a sequence 
that is made by an artificial combination of two or more otherwise separated segments of sequence. 
This anificial combination is often accomplished by chemical synthesis or, more commonly, by the 
artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques 
such as those described in Sambrook, supra . The term recombinant includes nucleic acids that have 
been altered solely by addition, substitution, or deletion of a ponion of the nucleic acid. Frequently, a 
recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter 
sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to 
transform a cell. 

Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a 
vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is 
expressed, inducing a protective immunological response in the mammal. 
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''Regulatory element" refers to a nucleic acid sequence from nomranslated regions of a gene, 
and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host 
proteins to carry out or regulate transcription or translation. 

"Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, 
an amino acid, or an antibody. They include radionuclides; enzymes; fluorescent, chemi luminescent, 
or chromogenic agents; substrates; cofactors; inhibitors: magnetic panicles; and other moieties known 
in the art. 

An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear 
sequence of nucleotides as the reference DNA sequence with the exception that ail occurrences of the 
nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose 
instead of deoxyribose. 

"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, 
antibodies, or other materials, and may be derived from any source (e.g., bodily fluids including, but 
not limited to, saliva, blood, and urine; chromosome(s), organelles, or membranes isolated from a 
cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; and cleared cells or tissues or 
blots or imprints from such cells or tissues). 

"Specific binding" or "specifically binding" refers to the interaction between a protein or 
peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent 
upon the presence of a panicular structure of the protein, e.g., the antigenic determinant or epitope, 
recognized by the binding molecule. For example, if an antibody is specific for epitope "A," the 
presence of a polypeptide containing epitope A, or the presence of free unlabeled A, in a reaction 
containing free labeled A and the antibody will reduce the amount of labeled A that binds to the 
antibody. 

"Substitution" refers to the replacement of at least one nucleotide or amino acid by a different 
nucleotide or amino acid. 

"Substrate" refers to any suitable rigid or semi-rigid suppon including, e.g., membranes, 
filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, 
microparticles or capillaries. The substrate can have a variety of surface forms, such as wells, 
trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound. 

A "transcript image" refers to the collective pattern of gene expression by a particular tissue 
or cell type under given conditions at a given time. 

"Transformation" refers to a process by which exogenous DNA enters a recipient cell. 
Transformation may occur under natural or artificial conditions using various methods well known in 
the art. Transformation may rely on any known method for the insenion of foreign nucleic acid 
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sequences into a prokaryotic or eukaryotic host cell. The method is selected based on the host cell 
being transformed. 

"Transformanis" include siabiy transformed cells in which the inserted DNA is capable of 
replication either as an autonomously replicating plasmid or as part of the host chromosome, as well 
as cells which transiently express inserted DNA or RNA. 

A "transgenic organism," as used herein, is any organism, including but not limited to animal 
and plants, in which one or more of the cells of the organism contains heterologous nucleic acid 
introduced by way of human intervention, such as by transgenic techniques well known in the art. 
The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of 
the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a 
recombinant virus. The term genetic manipuladon does not include classical cross-breeding, or in 
yitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The 
transgenic organisms contemplated in accordance with the present invention include bacteria 
cyanobacieria, fungi, and plants and animals. The isolated DNA of the present invention can be 
introduced into the host by methods known in the an, for example infection, transfection, 
transformation or transconjugation. Techniques for transferring the DNA of the present invention 
into such organisms are widely known and provided in references such as Sambrook at al. ( 1 989), 
supra > 

A '^variant" of a panicular nucleic acid sequence is defined as a nucleic acid sequence having 
at least 25% sequence identity to the particular nucleic acid sequence over a cenain length of one of 
the nucleic acid sequences using blasin with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%, at 
least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or even at least 98% or 
greater sequence identity over a certain defined length. The variant may result m '^conservative*' 
amino acid changes which do not affect structural and/or chemical properties. A variant may be 
described as, for example, an "allelic" (as defined above), ''splice, **species," or ^^polymorphic'' 
variant. A splice variant may have significant identity to a reference molecule, but will generally 
have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA 
processing. The corresponding polypeptide may possess additional functional domains or lack 
domains that are present in the reference molecule. Species variants are polynucleotide sequences 
that vary from one species to another. The resulting polypeptides generally will have significant 
ammo acid identity relative to each other, A polymorphic variant is a variation in the polynucleotide 
sequence of a particular gene between individuals of a given species. Polymorphic variants also may 
encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies 
by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease 
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Slate, or a propensity for a disease state. 

In an altemaiive» variants of the polynucleotides of the present invention may be generated 
through recombinant methods. One possible method is a DNA shuffling technique such as 
MOLECULARB REEDING (Maxygen Inc., Santa Clara CA: described in U.S. Patent iNumber 
5,837,458; Chang, C.-C. et al. (1999) Nat. Biotechnol. 17:793-797; Christians, F.C. et al. (1999) Nat. 
Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol. 14:315-319) to alter or 
improve the biological properties of MDDT, such as its biological or enzymatic activity or its ability 
to bind to other molecules or compounds. DNA shuffling is a process by which a library of gene 
variants is produced using PCR-mediated recombination of gene fragments. The library is then 
subjected to selection or screening procedures that identify those gene variants with the desired 
properties. These preferred variants may then be pooled and further subjected to recursive rounds of 
DNA shuffling and selection/screening. Thus, genetic diversity is created through "anificial" 
breeding and rapid molecular evolution. For example, fragments of a single gene containing random 
point mutations may be recombined, screened, and then reshuffled until the desired properties are 
optimized. Alternatively, fragments of a given gene may be recombined with fragments of 
homologous genes in the same gene family, either from the same or different species, thereby 
maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable 
manner. 

A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence having 
at least 40% sequence identity to the particular polypeptide sequence over a certain length of one of 
the polypeptide sequences using blastp with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07. 
1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 50%, at 
least 60%. at least 70%, at least 80%, at least 90%, at least 95%, or at least 98% or greater sequence 
identity over a certain defined length of one of the polypeptides. 

THE INVENTION 

In a particular embodiment, cDNA sequences derived from human tissues and cell lines were 
aligned based on nucleotide sequence identity and assembled into "consensus" or "template" 
sequences which are designated by the template identification numbers (template IDs) in column 2 of 
Table 1. The sequence identification numbers (SEQ ID NO:s) corresponding to the template IDs are 
shown in column 1 . The template sequences have similarity to GenBank sequences, or "hits," as 
designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is 
indicated by a probability score in column 4, and the functional annotation corresponding to each 
GenBank hit is listed in column 5. 

The invention incorporates the nucleic acid sequences of these templates as disclosed in the 
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Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states 
characterized by defects in molecules for disease detection and treatment. The invention funher 
utilizes these sequences in hybridization and amplification technologies, and in particular, in 
technologies which assess gene expression patterns correlated with specific ceils or tissues and their 
responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, 
the sequences of the present invention are used to develop a transcript image for a particular cell or 
tissue. 

Derivation of Nucieic Acid Sequences 

cDNA was isolated from libraries constructed using RNA derived from normal and diseased 
human tissues and cell lines. The human tissues and cell lines used forcDNA library construction 
were selected from a broad range of sources to provide a diverse population of cDNAs representative 
of gene transcription throughout the human body. Descriptions of the human tissues and cell lines 
used for cDNA library' construction are provided in the LIFESEQ database (Incyte Genomics, Inc. 
(Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, 
dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, 
reproductive, and urologic sources. 

Cell lines used for cDNA library construction were derived from, for example, leukemic 
cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. 
Such cell lines mclude, for example, THP-1, Jurkat, HUVEC, hNT2, W138, HeLa, and other cell 
lines commonly used and available from public depositories ( Amencan Type Culture Goliection, 
Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical 
agent such as 5 -aza-2 -deoxycytidine, treated with an activating agent such as lipopolysaccharide in 
the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 

Sequencing of the cDNAs 

Methods for DNA sequencing are well known in the art. Conventional enzymatic methods 
employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. 
Biochemical Corporation, Cleveland OH), Taq polymerase (PE Biosystems, Foster City CA), 
thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), 
Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found 
in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg 
MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA 
template of interest. Methods have been developed for the use of both single-stranded and double- 
stranded templates. Chain termination reaction products may be electrophoresed on urea- 
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polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucJeoiides) oi 
by fluorescence (for fluorophore-labeled nucleotides). Autonnated methods for mechanized reaction 
preparation, sequencing, and analysis using fluorescence detection methods have been developed. 
Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer 
system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, 
Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (PE Biosystems). 
Sequencing can be carried out using, for example, the ABI 373 or 377 (PE Biosystems) or 
MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA 
sequencing systems, or other automated and manual sequencing systems well known in the an. 

The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- 
the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified 
nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified 
bases do not represent a hindrance to practicing the invention for those skilled in the an. Several 
methods employing standard recombinant techniques may be used to correct errors and complete the 
missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Shon 
Protocols in Molecular Biologv. John Wiley Sl Sons, New York NY; and Sambrook, J. et al. (1989) 
Molecular Cloning. A Laboratory Manual . Cold Spring Harbor Press, Plainview NY.) 

Assembly of cDNA Sequences 

Human polynucleotide sequences may be assembled using programs or algorithms well 
known in the art. Sequences to be assembled are related, wholly or in pan, and may be derived from 
a single or many different transcripts. Assembly of the sequences can be performed using such 
programs as PHRAP (Phils Revised Assembly Program) and the GEL VIEW fragment assembly 
system (GCG), or other methods known in the art. 

Alternatively, cDNA sequences are used as ^'component" sequences that are assembled into 
"template'' or "consensus'' sequences as follows. Sequence chromatograms are processed, verified, 
and quality scores are obtained using PHRED. Raw sequences are edited usmg an editing pathway 
known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). 
A series of BLAST comparisons is performed and low-information segments and repetitive elements 
(e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's'\ or masked, to prevent spurious 
matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences 
are then loaded into a relational database management system (RDMS) which assigns edited 
sequences to existing templates, if available. When additional sequences are added into the RDMS, a 
process is initiated which modifies existing templates or creates new templates from works in 
progress (i.e., nonfinai assembled sequences) containing queued sequences or the sequences 
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themselves. After the new sequences have been assigned to templates, the templates can be merged 
into bins. If multiple templates exist in one bin. the bin can be split and the templates reannotated. 

Once gene bins have been generated based upon sequence alignments, bins are **clone joined*' 
based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in 
one bin and the 3' sequence from the same clone is present in a different bin, indicating thai the two 
bins should be merged into a single bin. Only bins which share at least two different clones are 
merged. 

A resultant template sequence may contain either a partial or a full length open reading 
frame, or all or part of a genetic regulatory element. This variation is due in pan to the fact thai the 
full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in 
length. With current technology, cDNAs comprising the coding regions of large genes cannot be 
cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete 
"second strand" synthesis. Template sequences may be extended to include additional contiguous 
sequences derived from the parent RNA transcript using a variety of methods known to those of skill 
in the art. Extension may thus be used to achieve the full length coding sequence of a gene. 

Analysis of the cDNA Sequences 

The cDNA sequences are analyzed using a variety of programs and algorithms which are well 
known m the an. (See, e.g., Ausubel, 1997, supra . Chapter 7.7; Meyers, R.A. (Ed.) { 1995) Molecular 
Biology and Biotechnology. Wiley VCH, New York NY, pp. 856-853; and Table 5.) These analyses 
comprise both reading frame determinations, e.g., based on tnplet codon periodicity for panicular 
organisms (Pickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and 
stop codons; and homology searches. 

Computer programs known to those of skill in the art for performing computer-assisted 
searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local 
Alignment Search Tool (BLAST; Alischul, S.F. (1993) J. Mol. Evol. 36:290-300; AltschuL S.F. et ah 
(1990) J. MoK Biol. 215:403-410), BLAST is especially useful in determining exact matches and 
companng two sequence fragments of arbitrary but equal lengths, whose alignment is locally 
maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the 
user (Karlin, S, et aL (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropnate search 
tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be 
searched for sequences containing regions of homology to a query mddt or MDDT of the present 
invention. 

Other approaches to the identification, assembly, storage, and display of nucleotide and 
polypeptide sequences are provided in ^'Relational Database for Storing Biomolecule Infomfiaiion,'' 
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U.S.S.N. 08/947,845, filed October 9. 1997; "Project-Based FuH-Length Biomolecuiar Sequence 
Database," U.S.S.N. 08/81 1J58, filed March 6, 1997; and "Relational Database and System for 
Storing Information Relating to Biomolecuiar Sequences," U.S.S.N, 09/034,807, filed March 4, 199S, 
all of which are incorporated by reference herein in their entirety. 

Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 
BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, 
in "Database System Employing Protein Function Hierarchies for Viewing Biomolecuiar Sequence 
Data," U.S,S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference. 

Human Disease Detection and Treatment Molecule Sequences 

The mddt of the present invention may be used for a variety of diagnostic and therapeutic 
purposes. For example, an mddt may be used to diagnose a panicular condition, disease, or disorder 
associated with disease detection and treatment molecules. Such conditions, diseases, and disorders 
include, but are not limited to, a cell proliferative disorder, such as actinic keratosis, aneriosclerosis, 
atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD), myelofibrosis, 
paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocyihemia, and 
cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, 
teratocarcinoma, and. in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, 
breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, 
pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and 
uterus: and an autoimmune/inflammatory disorder, such as actinic keratosis, acquired 
immunodeficiency syndrome (AIDS), Addison's disease, adult respiratory distress syndrome, 
allergies, ankylosing spondylitis, amyloidosis, anemia, aneriosclerosis, asthma, atherosclerosis, 
autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, 
contact dermatitis, Crohn's disease, atopic dermatitis, dermatomyositis, diabetes mellitus. 
emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomemlonephritis, 
Goodpasture's syndrome, gout. Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal 
hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with 
lymphocytotoxins, mixed connective tissue disease (MCTD), multiple sclerosis, myasthenia gravis, 
myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, 
polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, 
Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, 
primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, 
complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic 
cancer including lymphoma, leukemia, and myeloma. The mddt can be used to detect the presence 
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of, or to quantify the amount of, an mddt-related polynucleotide in a sample. This information is then 
compared to information obtained from appropriate reference samples, and a diagnosis is established. 
Alternatively, a polynucleotide complementary to a given mddt can inhibit or inactivate a 
therapeutically relevant gene related to the mddt. 

Analvsis of mddt Expression Patterns 

The expression of mddt may be routinely assessed by hybridization-based methods to 
determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificity 
of mddt expression. For example, the level of expression of mddt may be compared among different 
cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at 
different developmental stages, or among cell types or tissues undergoing various treatments. This 
type of analysis is useful, for example, to assess the relative levels of mddt expression in fully or 
partially differentiated cells or tissues, to determine if changes in mddt expression levels are 
correlated with the development or progression of specific disease states, and to assess the response 
of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. 
Methods for the analysis of mddt expression are based on hybridization and amplification 
technologies and include membrane-based procedures such as northern blot analysis, high-throughput 
procedures that utilize, for example, microarrays, and PCR-based procedures. 

Hybridization and Genetic Analvsis 

The mddt, their fragments, or complementary sequences, may be used to identify the presence 
of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The 
mddt may be hybridized to naturally occurring or recombinant nucleic acid sequences under 
appropriately selected temperatures and salt concentrations. Hybridization with a probe based on the 
nucleic acid sequence of at least one of the mddt allows for the detection of nucleic acid sequences, 
including genomic sequences, which are identical or related to the mddt of the Sequence Listing. 
Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides 
of SEQ ID NO: 1-14 and tested for their ability to identify or amplify the target nucleic acid sequence 
using standard protocols. 

Polynucleotide sequences that are capable of hybridizing, in particular, to those shown in 
SEQ ID NO: 1-14 and fragments thereof, can be identified using various conditions of stringency. 
(See, e.g., Wahl, G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) 
Methods Enzymol. 152:507-51 1.) Hybridization conditions are discussed in "Definitions.'* 

A probe for use in Southern or northern hybridization may be derived from a fragment of an 
mddt sequence, or its complement, that is up to several hundred nucleotides in length and is either 
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single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials 
such as plasmids, bacterial, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to 
artificial substrates containing mddt. Microarrays are particularly suitable for identifying the 
presence of and detecting the level of expression for multiple genes of interest by examining gene 
expression correlated with, e.g., various stages of development, treatment with a drug or compound, 
or disease progression. An array analogous to a dot or slot blot may be used to arrange and link 
polynucleotides to the surface of a substrate using one or more of the following: mechanical 
(vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of 
mddt and may be produced by hand or by using available devices, materials, and machines. 

Microarrays may be prepared, used, and analyzed using methods known in the an. (See, e.g., 
Brennan, T.M. et al. (1995) U.S. Patent No. 5,474.796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. 
USA 93:10614-10619; Baldeschweiier et al. (1995) PCT application W095/25I 1 16; Shalon. D. et al. 
(1995) PCT application WO95/35505; Heller, R.A. et al. (1 997) Proc. Natl. Acad. Sci. USA 94:2 150- 
2155; and Heller, MJ. et al. (1997) U.S. Patent No. 5,605,662.) 

Probes may be labeled by either PCR or enzymatic techniques using a variety of 
commercially available reporter molecules. For example, commercial kits are available for 
radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline 
phosphatase labeling (Life Technologies). Alternatively, mddt may be cloned into commercially 
available vectors for the production of RNA probes. Such probes may be transcribed in the presence 
of at least one labeled nucleotide (e.g., ^-P-ATP. Amersham Pharmacia Biotech). 

Additionally the polynucleotides of SEQ ID NO:l.I4 or suitable fragments thereof can be 
used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures 
well known in the art, e.g., cDNA library screening, PCR amplification, etc. The molecular cloning 
of such full length cDNA sequences may employ the method of cDNA library screening with probes 
using the hybridization, stringency, washing, and probing strategies described above and in Ausubei, 
su^ra, Chapters 3, 5, and 6. These procedures may also br °mpIoyed with genomic libraries to isolate 
genomic sequences of mddt in order to analyze, e.g., regulatory elements. 

Genetic Mapping 

Gene identification and mapping are important in the investigation and treatment of almost all 
conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, 
diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex 
than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being 
predictive of predisposition for a particular condition, disease, or disorder. For example, 
cardiovascular disease may result from malfunctioning receptor molecules that fail to clear 
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cholesterol from the bloodstream, and diabetes may result when a panicuiar individual's immune 
system is activated by an infection and attacks the insulin-producing cells of the pancreas. In some 
studies, Alzheimer's disease has been linked to a gene on chromosome 2 1 ; other studies predict a 
different gene and location. Mapping of disease genes is a complex and reiterative process and 
generally proceeds from genetic linkage analysis to physical mapping. 

As a condition is noted among members of a family, a genetic linkage map traces pans of 
chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of 
particular conditions to panicuiar regions of chromosomes, as defined by RFLP or other markers. 
(See, for example. Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) 
Occasionally, genetic markers and their locations are known from previous studies. More often, 
however, the markers are simply stretches of DNA that differ among individuals. Examples of 
genetic linkage maps can be found in various scientific journals or at the Online Mendelian 
Inheritance in Man (OMIM) World Wide Web site. 

In another embodiment of the invention, mddt sequences may be used to generate 
hybridization probes useful in chromosomal mapping of naturally occurring genomic sequences. 
Either coding or noncoding sequences of mddt may be used, and in some instances, noncoding 
sequences may be preferable over coding sequences. For example, conservation of an mddt coding 
sequence among members of a multi-gene family may potentially cause undesired cross hybridization 
during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a 
specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial 
chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes 
(BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See, e.g., Harrington, J, J. 
et ai. (1997) Nat. Genet. 15:345-355; Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B.J. 
(1 99 1 ) Trends Genet. 7: 1 49- 1 54.) 

Fluorescent in situ hybridization (FISH) may be correlated mih other physical chromosome 
mapping techniques and genetic map data. (See, e.g., Meyers, supra , pp. 965-968.) Correlation 
between the location of mddt on a physical chromosomal map and a specific disorder, or a 
predisposition to a specific disorder, may help define the region of DNA associated with that 
disorder. The mddt sequences may also be used to detect polymorphisms that are genetically linked 
to the inheritance of a particular condition, disease, or disorder. 

llLmu hybridization of chromosomal preparations and genetic mapping techniques, such as 
hnkage analysis using established chromosomal markers, may be used for extending existing genetic 
maps. Often the placement of a gene on the chromosome of another mammalian species, such as 
mouse, may reveal associated markers even if the number or arm of the corresponding human 
chromosome is not known. These new marker sequences can be mapped to human chromosomes and 
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may provide valuable information to investigators searching for disease genes using positional 
cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated 
by genetic linkage with a particular genomic region, e.g., ataxia-telangiectasia to 1 lq22-23, any 
sequences mapping to that area may represent associated or regulatory genes for funher investigation. 
(See, e.g., Gatti, R.A. et al. (1988) Nature 336:577-580,) The nucleotide sequences of the subject 
invention may also be used to detect differences in chromosomal architecture due to translocation, 
inversion, etc., among normal, carrier, or affected individuals. 

Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned 
in order to identify mutations or other alterations (e.g., translocations or inversions) that may be 
correlated with disease. This process requires a physical map of the chromosomal region containing 
the disease-gene of interest along with associated markers. A physical map is necessary for 
determining the nucleotide sequence of and order of marker genes on a paiticular chromosomal 
region. Physical mapping techniques are well known in the art and require the generation of 
overlapping sets of cloned DNA fragments from a particular organelle, chromosome, or genome. 
These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is 
determined, the DNA from that region is obtained by consulting the catalog and selecting clones from 
that region. The gene of interest is located through posiuonal cloning techniques using hybridization 
or similar methods. 

Diaenostic Uses 

The mddt of the present invention may be used to design probes useful in diagnostic assays. 
Such assays, well known to those skilled in the art, may be used to detect or confirm conditions, 
disorders, or diseases associated with abnormal levels of mddt expression. Labeled probes developed 
from mddt sequences are added to a sample under hybridizing conditions of desired stringency. In 
some instances, mddt, or fragments or oligonucleotides derived from mddt, may be used as primers in 
amplification steps prior to hybridization. The amount of hybridization complex formed is quantified 
and compared with standards for that cell or tissue. If mddt expression varies significantly from the 
standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or 
quantitative diagnostic methods may include nonhem. dot blot, or other membrane or dip-stick based 
technologies or multiple-sample format technologies such as PGR, enzyme-linked immunosorbent 
assay (ELISA)-like, pin, or chip-based assays. 

The probes described above may also be used to monitor the progress of conditions, 
disorders, or diseases associated with abnormal levels of mddt expression, or to evaluate the efficacy 
of a particular therapeutic treatment. The candidate probe may be identified from the mddt that are 
specific to a given human tissue and have not been observed in GenBank or other genome databases. 
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Such a probe may be used in animal studies* preclinical tesis, clinical trials, or in monitoring the 
treatment of an individual patient. In a typical process, standard expression is established by methods 
well known in the an for use as a basis of comparison, samples from patients affected by the disorder 
or disease are combined with the probe to evaluate any deviation from the standard profile, and a 
therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy 
is evaluated by determining whether the expression progresses toward or returns to the standard 
normal pattern. Treatment profiles may be generated over a period of several days or several months. 
Statistical methods well known to those skilled in the an may be use to determine the significance of 
such therapeutic agents. 

The polynucleotides are also useful for identifying individuals from minute biological 
samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individuaPs 
DNA. The polynucleotides of the present invention can also be used to determine the actual 
base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be 
used to prepare PGR primers for amplifying and isolating such selected DNA, which can then be 
sequenced. Using this technique, an individual can be identified through a unique set of DNA 
sequences. Once a unique ID database is established for an individual, positive identification of that 
individual can be made from extremely small tissue samples. 

In a particular aspect, oligonucleotide primers derived from the mddt of the invention may be 
used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insertions and 
deletions that are a frequent cause of inherited or acquired genetic disease in humans. Methods of 
SNP detection include, but are not limited to, single-stranded conformation polymorphism (SSCP) 
and fluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primers derived from the 
polynucleotide sequences encoding MDDT are used to amplify DNA using the polymerase chain 
reaction (PCR). The DNA may be derived, for example, from diseased or normal tissue, biopsy 
samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and tertiary 
structures of PCR products in single-stranded form, and these differences are detectable using gel 
electrophoresis in non-denaturing gels. In fSCCP, the oligonucleotide primers are fluorescently 
labeled, which allows detection of the amplimers in high-throughput equipment such as DNA 
sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP 
(isSNP), are capable of identifying polymorphisms by comparing the sequence of individual 
overlapping DNA fragments which assemble into common consensus sequences. These computer- 
based methods filter out sequence variations due to laboratory preparation of DNA and sequencing 
errors using statistical models and automated analyses of DNA sequence chromaiograms. In the 
alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the 
high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). 
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DNA-based identification techniques are criticaJ in forensic technology. DNA sequences 
taken from very small biological samples such as tissues, e.g., hair or skin, or body fluids, e.g., blood, 
saliva, semen, etc., can be amplified using, e.g., PGR, to identify individuals. (See, e.g., Erlich, H. 
(1992) PGR Technology . Freeman and Go., Nev^ York, NY). Similarly, polynucleotides of the 
present invention can be used as polymorphic markers. 

There is also a need for reagents capable of identifying the source of a panicular tissue. 
Appropriate reagents can comprise, for example, DNA probes or primers prepared from the 
sequences of the present invention that are specific for particular tissues. Panels of such reagents can 
identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to 
screen tissue cultures for contamination. 

The polynucleotides of the present invention can also be used as molecular weight markers on 
nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA in a 
particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel 
polynucleotides, in selection and synthesis of oligomers for attachment to an array or other suppon, 
and as an antigen to elicit an immune response. 

Disease Model Systems Using mddt 

The mddt of the invention or their mammalian homologs may be ''knocked out'' in an animal 
model system using homologous recombination in embryonic stem (ES) cells. Such techniques are 
well known in the art and are useful for the generation of animal models of human disease. (See, e.g., 
U.S. Patent Number 5.175,383 and U.S. Patent Number 5,767.337.) For example, mouse ES cells, 
such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown in culture. 
The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, 
e.g., the neomycin phosphotransferase gene (neo; Capecchi. M.R. (1989) Science 244:1288-1292). 
The vector integrates into the corresponding region of the host genome by homologous 
recombmation. Alternatively, homologous recombination takes place using the Cre-loxP system to 
knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J,D. (1996) 
Clin. Invest, 97:1999-2002; Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). 
Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from 
the C57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and 
the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous 
strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. 

The mddt of the invention may also be manipulated in vitro in ES cells derived from human 
blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell 
lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate 
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into, for example, neural cells, hematopoietic lineages, and cardiomyocyies (Thomson, J. A. ei al. 
(1998) Science 282:1 145-1 147). 

The mddt of the invention can also be used to create "knockin" humanized animals (pigs) or 
transgenic animals (mice or rats) to model human disease. With knockin technology, a reeion of 
mddt is injected into animal ES cells, and the injected sequence integrates into the animal cell 
genome. Transformed cells are injected into biasiulae, and the blastulae are implanted as described 
above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical 
agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to 
overexpress mddt, resulting, e.g., in the secretion of MDDT in its milk, may also serve as a 
convenient source of that protein (Janne, J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74). 

Screening Assavs 

MDDT encoded by polynucleotides of the present invention may be used to screen for 
molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and 
the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the 
polypeptide or the bound molecule. Examples of such molecules include antibodies, 
oligonucleotides, proteins (e.g., receptors), or small molecules. 

Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a 
ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et 
al., (1991) Current Protocols in Immunology 1(2): Chapter 5.) Similarly, the molecule can be closely 
related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, 
e.g., the active site. In either case, the molecule can be rationally designed using known techniques. 
Preferably, the screening for these molecules involves producing appropriate cells which express the 
polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from 
mammals, yeast, Drosophila , or E. coli. Cells expressing the polypeptide or cell membrane fractions 
which contain the expressed polypeptide are then contacted with a test compound and binding, 
stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed. 

An assay may simply test bmding of a candidate compound to the polypeptide, wherein 
bindmg is detected by a fluorophore. radioisotope, enzyme conjugate, or other detectable label. 
Alternatively, the assay may assess binding in the presence of a labeled competitor. 

Additionally, the assay can be carried out using cell-free preparations, polypeptide/molecule 
affixed to a solid suppon, chemical libraries, or natural product mixtures. The assay may also simply 
comprise the steps of mixing a candidate compound with a solution containing a polypeptide, 
measuring polypeptide/molecule activity or binding, and comparing the polypeptide/molecule activity 
or binding to a standard. 
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Preferably, an ELISA assay using, e.g., a monoclonal or polyclonal antibody, can measure 
polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly 
or indirectly, to the polypeptide or by competing with the polypeptide for a substrate. 

All of the above assays can be used in a diagnostic or prognostic context. The molecules 
discovered using these assays can be used to treat disease or to bring about a particular result in a 
patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the 
assays can discover agents which may inhibit or enhance the production of the polypeptide from 
suitably manipulated cells or tissues. 

Transcript Imagine 

Another embodiment relates to the use of mddt to develop a transcript image of a tissue or 
cell type. A transcript image is the collective pattern of gene expression by a particular tissue or cell 
type under given conditions and at a given time. This pattern of gene expression is defined by the 
number of expressed genes, their abundance, and their function. Thus the mddt of the present 
invention may be used to develop a transcript image of a tissue or cell type by hybridizing, preferably 
in a microarray format, the mddt of the present invention to the totality of transcripts or reverse 
transcripts of a tissue or cell type. The resultant transcript image would provide a profile of gene 
activity pertaining to disease detection and treatment. 

Transcript images which profile mddt expression may be generated using transcripts isolated 
from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect 
mddt expression in vivo, as in the case of a tissue or biopsy sample, or in vitro , as in the case of a cell 
line. Transcript images may be used to profile mddt expression in distinct tissue types. This process 
can be used to determine disease detection and treatment molecule activity in a particular tissue type 
relative to this activity in a different tissue type. Transcript images may be used to generate a profile 
of mddt expression characteristic of diseased tissue. Transcript images of tissues before and after 
treatment may be used for diagnostic purposes, to monitor the progression of disease, and to monitor 
the efficacy of drug treatments for diseases which affect the activity of disease detection and 
treatment molecules. 

Transcript images which profile mddt expression may also be used in conjunction with in 
vitro model systems and preclinical evaluation of pharmaceuticals. Transcript images of ceil lines 
can be used to assess disease detection and treatment molecule activity and/or to identify cell lines 
that lack or misreguiate this activity. Such cell lines may then be treated with pharmaceutical agents, 
and a transcript image following treatment may indicate the efficacy of these agents in restoring 
desired levels of this activity. A similar approach may be used to assess the toxicity of 
pharmaceutical agents as reflected by undesirable changes in disease detection and treatment 
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moJecule activity. Candidate pharmaceutical agents may be evaluated by comparing their associated 
transcript images with those of pharmaceutical agents of known effectiveness. 

Antisense Molecules 

The polynucleotides of the present invention are useful in antisense technology. Antisense 
technology or therapy relies on the modulation of expression of a target protein through the specific 
binding of an antisense sequence to a target sequence encoding the target protein or directing its 
expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics . Humana Press Inc., Totawa 
NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):I71-178: Crooke, S.T. (1997) Adv. Pharmacol. 
40:1-49; Sharma, H.W, and R. Narayanan (1995) Bioessays 17(12); 1055-1063; and Lavrosky, Y. ei 
aL (1997) Biochem. Mol. Med. 62(1):1 1-22.) An antisense sequence is a polynucleotide sequence 
capable of specifically hybridizing to at least a portion of the target sequence. Antisense sequences 
bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense 
sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. ( I99I ) 
Antisense Res. Dev. I(3):285-288; Lee, R. et al. (1998) Biochemistry 37(3):900-lD10; Pardridge, 
W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen, P. E. and Haaima, G. 
(1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression 
occurs through hybridization or binding of complementary base pairs. Antisense sequences can also 
bind to DNA duplexes through specific interactions in the major groove of the double helix. 

The polynucleotides of the present invention and fragments thereof can be used as antisense 
sequences to modify the expression of the polypeptide encoded by mddt. The antisense sequences 
can be produced ex vivo , such as by using any of the ABI nucleic acid synthesizer series (PE 
Biosystems) or other automated systems known in the an. Antisense sequences can also be produced 
biologically, such as by transforming an appropriate host cell with an expression vector containing 
the sequence of interest. (See, e.g., Agrawal, supra .) 

In therapeutic use, any gene delivery system suitable for introduction of the antisense 
sequences into appropriate target cells can be used. Antisense sequences can be delivered 
intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence 
complementary to at least a ponion of the cellular sequence encoding the target protein. (See, e.g., 
Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469«475; and Scanlon, K.J., et al. (1995) 
9(13): 1288-1296.) Antisense sequences can also be introduced intracellularly through the use of viral 
vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g., Miller, A.D. (1990) Blood 
76:27 1 ; Ausubel, P.M. et al. ( 1 995) Current Protocols in Molecular Biologv , John Wiley &. Sons, 
New York NY; Uckert, W. and W. Walther (1994) Pharmacol. Ther. 63(3):323-347.) Other gene 
delivery mechanisms include liposome -derived systems, artificial viral envelopes, and other systems 
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known in the art. (See, e.g., Rossi. J.J. (1995) Br. Med. Bull. 5 1 (I):2I7-225; Boado, R.J. et al. (1998) 
J, Pharm. Sci. 87(11): 1308-1315; and Morris, M.C. ei al. (1997) Nucleic Acids Res. 25(14):2730~ 
2736,) 

5 Expression 

In order to express a biologically active MDDT, the nucleotide sequences encoding MDDT or 
fragments thereof may be insened into an appropriate expression vector, i.e., a vector which contains 
the necessary elements for transcriptional and translationai control of the inserted coding sequence in 
a suitable host. Methods which are well known to those skilled in the art may be used to construct 

10 expression vectors containing sequences encoding MDDT and appropriate transcriptional and 
translationai control elements. These methods include in vitro recombinant DNA techniques, 
synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, supra. Chapters 4, 8. 
16, and 17; and AusubeL supra . Chapters 9, 10, 13, and 16.) 

A variety of expression vector/host systems may be utilized to contain and express sequences 

15 encoding MDDT. These include, but are not limited to, microorganisms such as bacteria transformed 
with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with 
yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); 
plant cell systems transformed with viral expression vectors (e.g., cauliflower mosaic virus, CaMV, 
or tobacco mosaic virus, TMV) or with bacterial expression vectors (e.g., Ti or pBR322 piasmids); or 

2 0 animal (mammalian) cell systems. (See, e.g., Sambrook, supra : Ausubel, 1995, supra . Van Heeke, G. 

andS.M. Schuster ( 1989) J. BioL Chem. 264:5503-5509; Bitter, G.A. etal. (1987) Methods Enzymol. 
153:516-544; Scorer, C.A. ei al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) 
Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; 
Takamatsu, N. (1987) EMBO J. 6:307-31 1; Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, 
25 R. etal. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; 
The McGraw Hill Yearbook of Science and Technology ( 1 992) McGraw Hill, New York NY, pp. 
191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81 :3655-3659; and Harrington, 
J.J. el al. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, 
adenoviruses, or herpes or vaccinia viruses, or from various bacterial piasmids, may be used for 

3 0 delivery of nucleotide sequences to the targeted organ, tissue, or cell population, (See, e.g., Di 

Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):35G-356; Yu, M. et al., (1993) Proc. Natl. Acad. Sci. 
USA 90(13):6340-6344; Buller, R.M. et al. ( 1985) Nature 3 17(6040):8 13-815; McGregor, D.P. et al. 
(1994) Mol. Immunol. 3 1(3):2 19-226; and Verma, I.M. and N. Somia (1997) Nature 389:239-242.) 
The invention is not limited by the host cell employed. 
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For long term production of recombinant proteins in mammalian systems, siabie expression 
of MDDT in cell lines is preferred. For example, sequences encoding MDDT can be transformed imo 
cell lines using expression vectors which may contain viral origins of replication and/or endogenous 
expression elements and a selectable marker gene on the same or on a separate vector. Any number 
of selection systems may be used to recover transformed cell lines. (See, e.g., Wigler, M, et al. 
(1977) Cell 11:223-232; Lowy, 1. et al, (1980) Cell 22:817-823.; Wigler, M. ei al. (1980) Proc. Natl. 
Acad, Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150: M 4; Hanman, 
S.C. and R.CMulligan (1988) Proc, Natl, Acad, Sci. USA 85:8047-8051; Rhodes, C.A. (1995) 
Methods Mol . Biol. 55:121-131.) 

Therapeutic Uses of mddt 

The mddt of the invention may be used for somatic or germline gene therapy. Gene therapy 
may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined 
immunodeficiency (SCID)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo, M. et 
al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an 
inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. (1995) Science 270:475-480; 
Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207- 
216; Crystal, R.G. et al. ( 1995) Hum. Gene Therapy 6:643-666; Crystal, R.G. et al. (1995) Hum, Gene 
Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from 
Factor Vm or Factor IX deficiencies (Crystal, R.G. (1995) Science 270:404-410: Verma, LM. and 
Somia, N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g., in the 
case of cancers which result from unregulated cell proliferation), or (iii) express a protein which 
affords protection against intracellular parasites (e.g., against human retroviruses, such as human 
immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. etal. (1996) 
Proc. Natl. Acad. Sci. USA. 93:1 1395-1 1399), hepatitis B or C virus (HBV, HCV); fungal parasites, 
such as Candida albicans and Paracoccidioides brasiliensis ; and protozoan parasites such as 
Plasmodium falciparum and Trypanosoma cruzi ). In the case where a genetic deficiency in mddt 
expression or regulation causes disease, the expression of mddt from an appropriate population of 
transduced cells may alleviate the clinical manifestations caused by the genetic deficiency. 

In a further embodiment of the invention, diseases or disorders caused by deficiencies in 
mddt are treated by constructing mammalian expression vectors comprising mddt and introducing 
these vectors by mechanical means into mddt-deficient cells. Mechanical transfer technologies for 
use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) 
ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene 
transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. (1993) Annu. Rev. 
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Biochem. 62:191-217; Ivies, Z. (1997) Cell 91:501olO: Boulay, J-L, and Recipon. H. (1998) Curr. 
Opin. Biotechnol. 9:445-450). 

Expression vectors that may be effective for the expression of mddt include, but are not 
limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Inviirogen, Carlsbad CA), 
PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Stratagene. La Jolla CA), and PTET-OFF, 
PTET-ON, PTRE2, PTRE2>LUC, PTK-HYG (Clontech, Palo Alto CA), The mddt of the invention 
may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), 
Rous sarcoma virus (RSV), SV40 vims, thymidine kinase (TK), or p-actin genes), (ii) an inducible 
promoter (e.g., the tetracycline-regulated promoter (Gossen, M. and Bujard, H. (1992) Proc. Natl. 
Acad. Sci. U.S.A. 89:5547-5551; Gossen, xM. et ah, (1995) Science 268:1766-1769; Rossi, F.M.V. 
and Blau, H,M. (1998) Curr. Opin. Biotechnol. 9:451-456), commercially available in the T-REX 
piasmid (Invitrogen)); the ecdysone-inducible promoter (available in the plasmids PVGRXR and 
FIND; Invitrogen); the FK506/rapamycin inducible promoter: or the RU486/mifepristone inducible 
promoter (Rossi, F.M.V. and Blau, H.M. supra) ), or (iii) a tissue-specific promoter or the native 
promoter of the endogenous gene encoding MDDT from a normal individual. 

Commercially available liposome transformation kits (e.g., the PERFECT LIPID 
TRANSFECTION KIT, available from Invitrogen) allow one with ordinary skill in the art to deliver 
polynucleotides to target cells in culture and require minimal effort to optimize experimental 
parameters. In the alternative, transformation is performed using the calcium phosphate method 
(Graham, F.L. and Eb, A.J. (1973) Virology 52:456-467), or by electroporation (Neumann, E. et al. 
(1982) EMBO L 1:841-845). The introduction of DNA to primary cells requires modification of 
these standardized mammalian transfection protocols. 

In another embodiment of the invention, diseases or disorders caused by genetic defects with 
respect lo mddt expression are treated by constmcting a retrovims vector consisting of (i) the mddt of 
the invention under the control of an independent promoter or the retrovirus long terminal repeat 
(LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) 
along with additional retrovirus c/5-acting RNA sequences and coding sequences required for 
efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available 
(Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 
92:6733-6737), mcorporated by reference herein. The vector is propagated in an appropriate vector 
producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target 
cells or a promiscuous envelope protein such as VSVg (Armentano, D. etal. (1987) J. Virol. 61:1647- 
1650; Bender, M.A. et al. (1987) J, Virol. 61:1639-1646; Adam, M.A. and Miller, A.D. (1988) J. 
Virol. 62:3802-3806; Dull, T. et al. (1998) . Virol. 72:8463-8471 ; Zufferey, R. et al. (1998) J. Virol. 
72:9873-9880). U.S. Patent Number 5,910,434 to Rigg ("Method for obtaining retrovims packaging 
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cell lines producing high transducing efficiency retroviral supernatant") discloses a method for 
obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of 
retrovirus vectors, transduction of a population of cells (e.g., CD4^ T-celis), and the return of 
transduced cells to a patient are procedures well known to persons skilled in the an of gene therapy 
and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer, G. et al. 
(1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga, U. et al. (1998) 
Proc. Natl. Acad. Sci. U.S.A. 95:1201-1206; Su, L. (1997) Blood 89:2283-2290). 

In the alternative, an adeno virus-based gene therapy delivery system is used to deliver mddt 
to cells which have one or more genetic abnormalities with respect to the expression of mddt. The 
construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in 
the art. Replication defective adenovirus vectors have proven to be versatile for imponing genes 
encoding irnmunoregulatory proteins into intact islets in the pancreas (Csete, M.E. et al. (1995) 
Transplantation 27:263-268). Potentially useful adenoviral vectors are described in U.S. Patent 
Number 5,707,6 18 to Armentano ("Adenovirus vectors for gene therapy"), hereby incorporated by 
reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:51 1-544 
and Verma, LM. and Somia, N. (1997) Nature 1 8:389:239-242, both incorporated by reference herein. 

In another alternative, a herpes-based, gene therapy delivery system is used to deliver mddt to 
target cells which have one or more genetic abnormalities with respect to the expression of mddt. 
The use of herpes simplex vims (HSV)-based vectors may be especially valuable for introducing 
mddt to cells of the central nervous system^ for which HSV has a tropism. The construction and 
packaging of herpes-based vectors are well known to those with ordinary skill in the an. A 
replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a 
reporter gene to the eyes of primates (Liu, X.etal. (1999) Exp. Eye Res. 169:385^395 ). The 
construction of a HSV-1 vims vector has also been disclosed in detail in U.S. Patent Number 
5,804,413 to DeLuca C'Herpes simplex virus strains for gene transfer*'), which is hereby incorporated 
by reference. U.S. Patent Number 5,804,413 teaches the ;-5e of recombinant HSV d92 which consists 
of a genome containing at least one exogenous gene to be transferred lo a cell under the control of the 
appropriate promoter for purposes including human gene therapy. Also taught by this patent are the 
construction and use of recombinant HSV strams deleted for ICP4, ICP27 and ICP22. For HSV 
vectors, see also Goins, W. Ret al. 1999 J. Virol. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 
163: 152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, 
the generation of recombinant vims following the transfection of multiple plasmids containing 
different segments of the large herpesvims genomes, the growth and propagation of herpesvirus, and 
the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art. 
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In another alternative, an alphavirus (positive, singie-siranded RNA virus) vector is used to 
deliver mddi to target cells. The biology of the protoiypic alphavirus, Semiiki Forest Virus (SFV), 
has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff, 
H. and Li, K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a 
subgenomic RNA is generated that normally encodes the viral capsid proteins. This subgenomic 
RNA replicates to higher levels than the full-length genomic RNA, resulting in the overproduction of 
capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). 
Similarly, inserting mddt into the alphavirus genome in place of the capsid-coding region results in 
the production of a large number of mddt RNAs and the synthesis of high levels of MDDT in vector 
transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, 
the ability to establish a persistent infection in hamster normal kidney cells (BHK-21 ) with a variant 
of Sindbis virus (SIN) indicates that the lytic replication of aiphavirases can be altered to suit the 
needs of the gene therapy application (Dryga. S.A. et al. (1997) Virology 228:74-83). The wide host 
range of aiphaviruses will allow the introduction of MDDT into a variety of cell types. The specific 
transduction of a subset of cells in a population may require the soning of cells prior to transduction. 
The methods of manipulating infectious cDNA clones of aiphaviruses, performing alphavirus cDNA 
and RNA transfections, and performing alphavims infections, are well known to those with ordinary 
skill in the an. 



Antibodies 

Anti-MDDT antibodies may be used to analyze protein expression levels. Such antibodies 
include, but are not limited to, polyclonal, monoclonal chimeric, single chain, and Fab fragments. 
For descriptions of and protocols of antibody technologies, see, e.g.. Pound J,D. (1998) 
Immunoc hemical Protocok . Humana Press, Totowa, NJ. 

The amino acid sequence encoded by the mddt of the Sequence Listing may be analyzed by 
appropriate software (e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions 
of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus, 
the N-terminus, and those intervening, hydrophilic regions of the polypeptide which are likely to be 
exposed to the external environment when the polypeptide is in its natural conformation. Analysis 
used to select appropriate epitopes is also described by Ausubel (1997, supra . Chapter 1 1.7). Peptides 
used for anribody induction do not need to have biological acrivity; however, they must be antigenic. 
Peptides used to induce specific antibodies may have an amino acid sequence consisting of at five 
amino acids, preferably at least 10 amino acids, and most preferably 15 amino acids. A peptide which 
mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as 
keyhole limpet cyanin (KLH: Sigma, St. Louis MO) for antibody production. A peptide 
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encompassing an antigenic region may be expressed from an mddt, synthesized as described above, or 
purified from human cells. 

Procedures well known in the an may be used for the production of antibodies. Various hosts 
including mice, goats, and rabbits, may be inmiunized by injection with a peptide. Depending on the 
host species, various adjuvants may be used to increase immunological response. 

In one procedure, peptides about 15 residues in length may be synthesized using an ABI 
431 A peptide synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by 
reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel, 1995, supra >. Rabbits are 
immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are 
tested for antipeptide activity by binding the peptide to plastic, blocking with 1 % bovine serum 
albumin (BSA), reacting with rabbit antisera, washing, and reacting with radioiodinated goat anti- 
rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well 
known in the art, including ELISA, radioimmunoassay (RIA), and immunoblotting. 

In another procedure, isolated and purified peptide may be used to immunize mice (about 100 
fig of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and 
used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. 
Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of 
peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are 
detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific 
monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST, Becton-Dickinson, 
Palo Alto, CA) are coated with affinity-purified, specific rabbii-anti-mouse (or suitable anti-species 
IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1% BSA and washed and exposed to 
supematants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 
mg/ml. 

Clones producing antibodies bind a quantity of labeled peptide that is detectable above 
background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are 
injected into pristane-treated mice to produce ascites, and monoclonal antibody is purified from the 
ascitic fluid by affinity chromatography on protein A (Amersham Pharmacia Biotech). Several 
procedures for the production of monoclonal antibodies, including in vitro production, are described 
in Pound ( supra ). Monoclonal antibodies with antipeptide activity are tested for anti-MDDT activity 
using protocols well known in the art, including ELISA, RIA, and immunoblotting. 

Antibody fragments containing specific binding sites for an epitope may also be generated. 
For example, such fragments include, but are not limited to, the F(ab*)2 fragments produced by pepsin 
digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges 
of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous 
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bacteriophage allows rapid and easy identification of monoclonal fragments with desired specificity 
(Pound, supra, Chaps. 45-47). Antibodies generated against polypeptide encoded by mddt can be used 
to purify and characterize full-length MDDT protein and its activity, binding partners, etc. 

Assays Using Antibodies 

Anti-MDDT antibodies may be used in assays to quantify the amount of MDDT found in a 
panicuiar human cell. Such assays include methods utilizing the antibody and a label lo detect 
expression level under normal or disease conditions. The peptides and antibodies of the invention 
may be used with or without modification or labeled by joining them, either covalently or 
noncovalently, with a reporter molecule. 

Protocols for detecting and measuring protein expression using either polyclonal or 
monoclonal antibodies are well known in the an. Examples include ELISA, RLA, and fluorescent 
activated cell sorting (FACS). Such immunoassays typically involve the formation of complexes 
between the MDDT and its specific anubody and the measurement of such complexes. These and 
other assays are described in Pound (supra ). 

Without further elaboration, it is believed that one skilled in the art can, using the preceding 
description, utilize the present invention to its fullest extent. The following preferred specific 
embodiments are, therefore, to be constmed as merely illustrative, and not limitative of the remainder 
of the disclosure in any way whatsoever. 

The disclosures of all patents, applications, and publications mentioned above and below, in 
particular U.S. Provisional Application No. 60/137,412, filed June 3, 1999, U.S. Provisional 
Application No. 60/147,542, filed August 5, 1999, U.S. Provisional Application No. 60/147,501, filed 
August 5, 1999, U.S. Provisional Application No. 60/147,500, filed August 5, 1999 are hereby 
expressly incorporated by reference. 

EXAMPLES 

I. Construction of cDNA Libraries 

UNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from 
various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while 
others were homogenized and lysed in phenol or m a suitable mixture of denaturants, such as 
TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The 
resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was 
precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods. 

Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA 
purity. In most cases, RNA was treated with DNase. For most libraries, polyv A+) RNA was isolated 
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using oligo d(T)-coupled paramagnetic panicles (Promega Corporation (Promega), Madison WI), 
OLIGOTEX latex panicles (QIAGEN, Inc. (QIAGEN). Valencia CA). or an OLIGOTEX mRNA 
purification kit (QIAGEN). Alternatively, RNA was isolated directly from tissue lysates using other 
RNA isolation kits, e.g., the POLY(A)PURE mRNA purification kit (Ambion, Inc.. Austin TX). 

In some cases, Slratagene was provided with RNA and constructed the corresponding cDNA 
libraries. Otherwise, cDNA was synthesized and cDNA libraries were constructed with the UKIZAP 
vector system fStratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT 
plasmid system (Life Technologies), using the reconrmiended procedures or similar methods known in 
the an. (See, e.g., Ausubel, 1997, supra . Chapters 5.1 through 6.6.) Reverse transcription was 
initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to 
double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or 
enzymes. For most libraries, the cDNA was size-selected (300-1000 bp) using SEPHACRYL SI 000, 
SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography ( Amersham Pharmacia 
Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction 
enzyme sites of the polyiinker of a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), 
pSPORTI plasmid (Life Technologies), or pINCY (Incyte). Recombinant plasmids were transformed 
into competent E. coli cells including XLl-Blue, XLl-BlueMRF, or SOLR from Stratagene or DH5a, 
DHIOB, or ElectroMAX DHIOB from Life Technologies. 

II. Isolation of cDN A Clones 

Plasmids were recovered from host ceils by in vivo excision using the UNIZAP vector system 
(Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or 
WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge 
BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra 
plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). 
Following precipitation, plasmids were resuspended in 0. 1 ml of distilled water and stored, with or 
without lyophilization, at 4°C. 

Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a 
high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host cell lysis and therrna! 
cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 
384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically 
using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes), Eugene OR) and a 
FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland). 
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III. Sequencing and Analysis 

cDNA sequencing reactions were processed using standard methods or high-throughput 



instrumentation such as the ABI CATALYST 800 thermal cycler (PE Biosystems) or the PTC-200 
thermal cycler (MJ Research) in conjunction with the HYDRA microdispenser (Robbins Scientific 
Corp., Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing 
reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied in ABI 
sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE 
Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled 
polynucleotides were canried out using the MEGABACE 1000 DNA sequencing system (Molecular 
Dynamics); the ABI PRISM 373 or 377 sequencing system fPE Biosystems) in conjunction with 
standard ABI protocols and base calling software; or other sequence analysis systems known in the 
an. Reading frames within the cDNA sequences were identified using standard methods (reviewed in 
AusubeU 1997, supra . Chapter 7.7). Some of the cDNA sequences were selected for extension using 
the techniques disclosed in Example VDI. 

IV. Assembly and Analysis of Sequences 

Component sequences from chromatograms were subject to PHRED analysis and assigned a 
quality score. The sequences having at least a required quality score were subject to various pre- 
processing editing pathways to eliminate, e.g., low quality 3^ ends, vector and linker sequences, polyA 
tails, Alu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and 
sequences smaller than 50 base pairs. In particular, low-information sequences and repetitive 
elements (e.g., dinucleotide repeats. Alu repeats, etc.) were replaced by '*n^s'\ or masked, to prevent 
spurious matches. 

Processed sequences were then subject to assembly procedures in which the sequences were 
assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each eene 
bm were assembled to produce consensus sequences (templates). Subsequent new sequences were 
added to existing bins using BLASTn (v. 1.4 WashU) and CROSSMATCH. Candidate pairs were 
identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at 
least 82% local identity were accepted into the bin. The component sequences from each bin were 
assembled using a version of PHRAP. Bins with several overlapping component sequences were 
assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was 
determined based on the number and orientation of its component sequences. Template sequences as 
disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading 
frames), to the best determination. The complementary (antisense) strands are inherently disclosed 
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herein. The component sequences which were used to assemble each template consensus sequence 
are listed in Table 4, along with their positions along the template nucleotide sequences. 

Bins were compared against each other and those having local similarity of at least 82% were 
combined and reassembled. Reassembled bins having templates of insufficient overlap (less than 
95% local identity) were re-split. Assembled templates were also subject to analysis by 
STETCHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice 
variants, alternatively spliced exons, splice junctions, differential expression of alternative spliced 
genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of 
the above assembly procedures. 

Once gene bins were generated based upon sequence alignments, bins were clone joined 
based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' 
sequence from the same clone was present in a different bin, it was likely that the two bins actually 
belonged together in a single bin. The resulting combined bins underwent assembly procedures to 
regenerate the consensus sequences. 

The final assembled templates were subsequently annotated using the following procedure. 
Template sequences were analyzed using BLASTn (v2.0, NCBI) versus gbpri (GenBank version 1 16). 
"Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 
100% local identity over 100 base pairs, or a homolog match having an E- value, i.e. a probability 
score, of < 1 X 10 ^ The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 
1 1 6). (See Table 5). In this analysis, a homolog match was defined as having an E-value of :S 1 x 10' 
I The assembly method used above was described in "System and Methods for Analyzing 
Biomoiecular Sequences," U.S.S.N. 09/276,534, filed March 25, 1999, and the LIFESEQ Gold user 
manual (Incyie) both incorporated by reference herein. 

Following assembly, template sequences were subjected to motif, BLAST, and functional 
analyses, and categorized in protein hierarchies using methods described in, e.g., "Database System 
Employing Protein Function Hierarchies for Viewing Biomoiecular Sequence Data," U.S.S.N. 
08/812,290, filed March 6, 1997; "Relational Database for Stonng Biomoiecule Information,'^ 
U.S.S.N. 08/947,845, filed October 9, 1997; *Troject-Based Full-Length Biomoiecular Sequence 
Database," U.S.S.N. 08/81 1,758, filed March 6, 1 997; and ^'Relational Database and System for 
Storing Information Relating to Biomoiecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, 
all of which are incorporated by reference herein. 

The template sequences were funher analyzed by translating each template in all three 
forward reading frames and searching each translation against the Pfam database of hidden Markov 
model-based protein families and domains using the HMMER software package (available to the 
public from Washington University School of Medicine, St. Louis MO). Regions of templates which. 
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when translated, contain similarity to Pfam consensus sequences are reported in Table 2, along with 
descriptions of Pfam protein domains and families. Only those Pfam hits with an E-vaiue of < 1 x 10 -' 
are reported. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam 
protein domains and families.) 
5 Additionally, the template sequences were translated in all three forward readmg frames, and 

each translation was searched against hidden Markov models for signal peptide and transmembrane 
domains using the HMMER software package. Construction of hidden Markov models and their 
usage in sequence analysis has been described. (See, for example, Eddy, S.R. (1996) Curr. Opin. Str. 
Biol. 6:361-365,) Regions of templates which, when translated, contain similarity to signal peptide or 

10 transmembrane domain consensus sequences are reported in Table 3. Only those signal peptide or 
transmembrane hits with a cutoff score of 1 1 bits or greater are reported. A cutoff score of 1 1 bits or 
greater corresponds to at least about 91-94% true-positives in signal peptide prediction, and at least 
about 75% true-positives in transmembrane domain prediction. 

The results of HMMER analysis as reported in Tables 2 and 3 may suppon the results of 

15 BLAST analysis as reponed in Table 1 or may suggest alternative or additional propenies of 
template-encoded polypeptides not previously uncovered by BLAST or other analyses. 

Template sequences are funher analyzed using the bioinfomiatics tools listed in Table 5, or 
using sequence analysis software known in the art such as MACDNASIS PRO software (Hitachi 
Software Engineering, South San Francisco CA) and LASERGENE software (DNASTAR). 

20 Template sequences may be further queried against public databases such as the GenBank rodent, 
mammalian, venebrate, prokaryote, and eukaryote databases. 

V. Analysis of Polynucieotide Expression 

Northern analysis is a laboratory technique used to detect the presence of a transcript of a 
25 gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs 
from a particular cell type or tissue have been bound. (See, e.g., Sambrook, supra > ch. 7; Ausubel, 
1995, supra , ch. 4 and 16.) 

Analogous computer techniques applying BLAST were used to search for identical or related 
molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Pharmaceuticals). This analysis 
3 0 is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the 

computer search can be modified to determine whether any particular match is categorized as exact or 
similar. The basis of the search is the product score, which is defined as: 

BLAST Score X Percent Identity 

35 5 X minimum {length(Seq, 1), length(Seq. 2)} 
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The product score takes into account both the degree of similarity between two sequences and the 
length of the sequence match. The product score is a normahzed value between 0 and 100, and is 
calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the 
product is divided by (5 times the length of the shorter of the two sequences). The BLAST score is 
5 calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair 
fHSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by 
. gaps). If there is more than one HSP, then the pair with the highest BLAST score is used to calculate 
the product score. The product score represents a balance between fractional overlap and quality in a 
BLAST alignment. For example, a product score of 100 is produced only for 100% identity over the 
10 entire length of the shoner of the two sequences being compared. A product score of 70 is produced 
either by 100% identity and 70% overlap at one end, or by 88% identity and 100% overlap at the 
other. A product score of 50 is produced either by 100% identity and 50% overlap at one end, or 79% 
identity and 100% overlap. 

15 VL Tissue Distribution Profiling 

A tissue distribution profile is determined for each template by compiling the cDNA library 
fjfl tissue classifications of its component cDNA sequences. Each component sequence, is derived from 

a cDNA library constructed from a human tissue. Each human tissue is classified into one of the 
following categories: cardiovascular system; connective tissue: digestive system; embryonic 
20 structures; endocrine system; exocrine glands; genitalia, female; genitalia, male; germ cells; hemic 
and immune system; liver; musculoskeletal system: nervous system; pancreas: respirator)' system; 
sense organs; skin; stomatognathic system; unclassified/mixed: or urinary tract. Template sequences, 
component sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD 
database (Incyte Genomics, Palo Alto CA). 

25 

VIL Transcript Image Analysis 

Transcript images are generated as described in Seilhamer ei al, ^'Comparative Gene 
Transcript Analysis," U.S. Patent Number 5,840,484, incorporated herein by reference. 

30 VIIL Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA 

Oligonucleotide primers designed using an mddt of the Sequence Listing are used to extend 
the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the 
other primer, to initiate 3' extension of the template. The initial primers may be designed using 
OLIGO 4.06 software (National Biosciences, Inc. (National Biosciences), Plymouth MN), or another 
35 appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or 
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r Te, and lo anneal to the target sequence at temperatures of about 68 ''C to about 72 "^C. Any stretch 
c . nucleotides which would result in hairpin structures and primer-primer dimerizaiions are avoided. 
Selected human cDNA libraries are used to extend the sequence. If more than one extension is 
necessary or desired, additional or nested sets of primers are designed. 

High fidelity amplification is obtained by PCR using methods well known in the an, PGR is 
performed in 96-weIl plates using the PTC-200 thermal cycler (MJ Research). The reaction mix 
contains DNA template, 200 nmol of each primer, reaction buffer containing Mg-\ (NHJ.SO4, and B- 
mercaptoeihanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life 
Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair 
PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68 2 
min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68*^0, 5 min; Step 7: storage at 4°C. In the 
alternative, the parameters for primer pair T7 and SK-f are as follows: Step 1 : 94 ''C, 3 min; Step 2: 
94"C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min: Step 5: Steps 2, 3, and 4 repeated 20 times: 
Step 6: 68 5 min; Step 7: storage at 4'C. 

The concentration of DNA in each well is determined by dispensing 100 fil PICOGREEN 
quantitation reagent (0.25% (v/v); Molecular Probes) dissolved in IX Tris-EDTA (TE) and 0.5 ^1 of 
undiluted PCR product into each well of an opaque fluorimeter plate (Coming Incorporated 
(Coming), Coming NY), allowing the DNA to bind to the reagent. The plate is scanned in a 
FLUOROSKAN II (Labsystems Oy) to measure the fluorescence of the sample and to quantify the 
concentration of DNA. A 5 ^1 to 10 |al aliquot of the reaction mixture is analyzed by electrophoresis 
on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence. 

The extended nucleotides are desalted and concentrated, transferred to 384-well plates, 
digested with CviJI cholera virus endonuciease (Molecular Biology Research, Madison WI), and 
sonicated or sheared prior to religation into pUC 1 8 vector (Amersham Pharmacia Biotech). For 
shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) 
agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones 
are reiigated using T4 ligase (New England Biolabs, Inc., Beverly MA) into pUC 18 vector 
(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction 
site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on 
antibiotic-containing media, individual colonies are picked and cultured overnight at 37°C in 384- 
well plates in LB/2x carbenicillin liquid media. 

The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham 
Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step I : 
94^C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72"C, 2 min; Step 5: steps 2, 3, and 4 
repeated 29 limes; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA is quantified by PICOGREEN 
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reagent (Molecular Probes) as described above. Samples with low DNA recoveries are reampiified 
using the same conditions as described above. Sarnples are diluted with 20% dimethysulfoxide ( 1 :2, 
v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYEN AMIC 
DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle 
sequencing ready reaction kit (PE Biosystems). 

In like manner, the mddt is used to obtain regulatory sequences (promoters, introns, and 
enhancers) using the procedure above, oligonucleotides designed for such extension, and an 
appropriate genomic library. 

IX» Labeling of Probes and Southern Hybridization Analyses 

Hybridization probes derived from the mddt of the Sequence Listing are employed for 
screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides bet\yeen 1 00 and 
1000 nucleotides in length is specifically described, but essentially the same procedure may be used 
with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using 
a T4 polynucleotide kinase, y^~P-ATP, and 0.5X One-Phor~All Pius (Amersham Pharmacia Biotech) 
buffer and purified using a ProbeQuant G-50 Microcolumn {Amersham Pharmacia Biotech). The 
probe mixture is diluted to 10' dpm/fig/mi hybridization buffer and used in a typical membrane-based 
hybridization analysis. 

The DNA is digested with a restriction endonuciease such as Eco RV and is electrophoresed 
through a 0.7% agarose geL The DNA fragments are transferred from the agarose to nylon membrane 
(NYTRAN Plus, Schleicher &: Schuell, Inc., Keene NH) using procedures specified by the 
manufacturer of the membrane. Prehybridization is carried out for three or more hours at 68 °C, and 
hybridization is carried out overnight at 68 °C. To remove non-specific signals, blots are sequentially 
washed at room temperature under increasingly stringent conditions, up to 0. 1 x saline sodium citrate 
(SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORLMAGER 
cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of 
standard and experimental lanes are compared. Essentially the same procedure is employed when 
screening RNA. 

X. Chromosome Mapping of mddt 

The cDNA sequences which were used to assemble SEQ ID NO: 1-14 are compared with 
sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other 
implementations of the Smith- Waterman algorithm. Sequences from these databases that match SEQ 
ID NO: 1-14 are assembled into clusters of contiguous and overlapping sequences using assembly 
algorithms such as PHRAP (Table 5). Radiation hybrid and genetic mapping data available from 
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public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for 
Genome Research (WIGR), and Genethon are used to detenmine if any of the clustered sequences 
have been previously mapped. Inclusion of a mapped sequence in a cluster will result in the 
assignment of all sequences of that cluster, including its particular SEQ ID NO:, to that map location. 
5 The genetic map locations of SEQ ID NO: 1-14 are described as ranges, or interv als, of human 

chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus 
of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombination 
frequencies between chromosomal markers. On average, I cM is roughly equivalent to 1 megabase 
(Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) 
10 The cM distances are based on genetic markers mapped by Genethon which provide boundaries for 
radiation hybrid markers whose sequences were included in each of the clusters. 

XL Microarray Analysis 

f^robe Preparation from Tissue or Cell Samples 
15 Total RNA is isolated from tissue samples using the guanidinium ihiocyanate method and 

poly A* RNA is purified using the oligo (dT) cellulose method. Each polyA* RNA sample is reverse 
transcribed using MMLV reverse-transcriptase, 0.05 pg/pl oligo-dT primer (21mer), IX first strand 
buffer, 0.03 units/|il RNase inhibitor, 500 pM dATP, 500 pM dOTP, 500 dTTP, 40 |iM dCTP, 40 
dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription 

2 0 reaction is performed in a 25 mi volume containing 200 ng polyA* RNA with GEMBRIGHT kits 

(Incyte), Specific control polyA"^ RNAs are synthesized by in vitro transcription from non-coding 
yeast genomic DNA (W. Lei, unpublished). As quantitative controls, the control mRNAs at 0.002 ng, 
0.02 ng, 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of 1 : 100,000, 
1:10,000, 1:1000, l:100(w/w) to sample mRNA respectively. The control mRNAs are diluted into 
25 reverse transcription reaction at ratios of 1:3,3:1, 1:10, 10:1, 1:25, 25:1 (w/w) to sample mRNA 
differential expression patterns. After incubation at 37^ C for 2 hr, each reaction sample (one with 
Cy3 and another with Cy5 labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated 
for 20 minutes at 85''C to the stop the reaction and degrade the RNA. Probes are purified using two 
successive CHROMA SPIN 30 gel filtration spin coiunnns (CLONTECH Laboratories, Inc. 

3 0 (CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated 

using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is 
then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and 
resuspended in 14 |al 5X SSC/0,2% SDS. 
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Microarrav Preparation 

Sequences of the present invention are used to generate array elements. Each array element 
is amplified from bacterial cells containing vectors with cloned cDNA insens. PCR amplification 
uses primers complementary to the vector sequences flanking the cDKA insert. Array elements are 
amplified in thiny cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 
fig. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia 
Biotech). 

Purified array elements are immobilized on polymer-coated glass slides. Glass microscope 
slides (Coming) are cleaned by ultrasound in 0.1% SDS and acetone, with extensive distilled water 
washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR 
Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distilled water, 
and coated with 0.05% aminopropyl silane (Sigma) in 95% ethanoL Coated slides are cured in a 
llO^Coven. 

Array elements are applied to the coated glass substrate using a procedure described in US 
Patent No. 5,807,522, incorporated herein by reference. 1 pi of the array element DNA. at an average 
concentration of 100 ng/fil, is loaded into the open capillary printing element by a high-speed robotic 
apparatus. The apparatus then deposits about 5 nl of array element sample per slide. 

Microarrays are UV-crosslinked using a STRATALINKER UV-crosslinker (Stratagene). 
Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. 
Non-specific binding sites are blocked by incubation of microarrays in 0,2% casein in phosphate 
buffered saline (PBS) (Tropix, Inc., Bedford, MA) for 30 minutes at 60^ C followed by washes in 
0.2% SDS and distilled water as before. 

Hybridization 

Hybridization reactions contain 9 fil of probe mixture consisting of 0.2 pg each of Cy3 and 
Cy5 labeled cDNA synthesis products in 5X SSC, 0.2% SDS hybridization buffer. The probe mixture 
is heated to 65"^ C for 5 minutes and is aliquoted onto the rmcroarray surface and covered with an 1.8 
cm- coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger 
than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 pi 
of 5x SSC in a comer of the chamber. The chamber containing the arrays is incubated for about 6.5 
hours at 60° C. The arrays are washed for 10 min at 45° C in a first wash buffer ( 1 X SSC, 0. 1 % SDS), 
three times for 10 minutes each at 45°C in a second wash buffer (O.IX SSC). and dried. 

Detection 

Reporter-labeled hybridization complexes are detected with a microscope equipped with an 
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Innova 70 mixed gas 10 W laser (Coherent, inc.. Santa Clara CA) capable of generating spectral lines 
at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is 
focused on the array using a 20X microscope objective (Nikon, inc., Melville NY), The slide 
containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 
5 scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a 
resolution of 20 micrometers. 

In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. 
Emitted light is split, based on wavelength, into two photomultiplier tube detectors (PMT R1477, 
Hamamatsu Photonics Systems, Bridgewater NJ) corresponding to the two fluorophores. Appropriate 

10 filters positioned between the array and the photomultiplier tubes are used to filter the signals. The 
emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is 
typically scanned twice, one scan per fluorophore using the appropriate filters at the laser source, 
although the apparatus is capable of recording the spectra from both fluorophores simultaneously. 
The sensitivity of the scans is typically calibrated using the signal intensity generated by a 

15 cDNA control species added to the probe mix at a known concentration. A specific location on the 
array contains a complementary DNA sequence, allowing the intensity of the signal at that location to 
be correlated with a weight ratio of hybridizing species of 1 : 100,000. When two probes from 
different sources (e.g., representing test and control cells), each labeled with a different fluorophore, 
are hybridized to a single array for the purpose of identifying genes that are differentially expressed, 

20 the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and 
adding identical amounts of each to the hybridization mixture. 

The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digiial 
(A/D) conversion board (Analog Devices, Inc., Norwood, MA) installed in an IBM-compatible PC 
computer. The digitized data are displayed as an image where the signal intensity is mapped using a 

25 linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high 
signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and 
measured simultaneously, the data are first corrected for optical crosstalk (due to overiapping 
emission spectra) between the fluorophores using each fluorophore s emission spectrum. 

A grid is superimposed over the fluorescence signal image such that the signal from each spot 

30 is centered in each element of the grid. The fluorescence signal within each element is then 

integrated to obtain a numerical value corresponding to the average intensity of the signal. The 
software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte). 

XII. Complementary Nucleic Acids 
^5 Sequences complementary to the mddt are used to detect, decrease, or inhibit expression of 
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the naturally occurring nucleotide. The use of oligonucleotides comprising from about 15 to 30 base 
pairs is typical in the an. However, smaller or larger sequence fragments can also be used. 
Appropriate oligonucleotides are designed from the mddt using OLIGO 4.06 software (National 
Biosciences) or other appropriate programs and are synthesized using methods standard in the art or 
5 ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is 
designed from the most unique 5' sequence and used to prevent transcripuon factor binding to the 
promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent 
ribosomal binding and processing of the transcript. 

10 XIII. Expression of MDDT 

Expression and purification of MDDT is accomplished using bacterial or virus-based 
expression systems. For expression of MDDT in bacteria, cDNA is subcldned into an appropriate 
vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of 
cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac {tac) 

15 hybrid promoter and the T5 or T7 bacteriophage promoter in conjunction with the lac operator 
regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g., 
BL21 (DE3). Antibiotic resistant bacteria express MDDT upon induction with isopropyl beta-D- 
thiogalactopyranoside (IPTG). Expression of MDDT in eukaryotic ceils is achieved by infecting 
insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis vims 

20 (AcMNPV), commonly known as baculovims. The nonessential poiyhedrin gene of baculovirus is 
replaced with cDNA encoding MDDT by either homologous recombination or bacterial-mediated 
transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong 
poiyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to 
infect Spodoptera frugiperda (Sf9) insect cells in most cases, or human hepaiocytes, in some cases. 

2 5 Infection of the latter requires additional genetic modifications to baculovims. (See e.g., Engelhard, 

supra : and Sandig, supra .) 

In most expression systems, MDDT is synthesized as a fusion proiem with, e.g., glutathione 
S-transferase (GST) or a peptide epitope tag, such as FLAG or 6-His, permitting rapid, smgle-step, 
affinity-based purification of recombinant fusion protein from cmde cell lysates. GST, a 26- 

3 0 kilodalton enzyme from Schistosoma iaponicum . enables the purification of fusion protems on 

immobilized glutathione under conditions that maintain protein activity and antigenicity ( Amersham 
Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from 
MDDT at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity 
purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman 
3 5 Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables 
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purification on metal-cheiate resins (QIAGEN). Methods for protein expression and purification are 
discussed in Ausubel (1995, supra . Chapters 10 and 16). Purified MDDT obtained by these methods 
can be used directly in the following activity assay. 

XIV. Demonstration of MDDT Activity 

MDDT, or biologically active fragments thereof, are labeled with '-^I Bolion-Humer reagent. 
(See, e.g., Bolton, A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules 
previously arrayed in the wells of a multi-well plate are incubated with the labeled MDDT, washed, 
and any wells with labeled MDDT complex are assayed. Data obtained using different 
concentrations of MDDT are used to calculate values for the number, affinity, and association of 
MDDT with the candidate molecules. 

Alternatively, molecules interacting with MDDT are analyzed using the yeast two-hybrid 
system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially 
available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). 

MDDT may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) 
which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions 
between the proteins encoded by two large libraries of genes (Nandabalan, K. et al. (2000) U.S. 
Patent No. 6,057,101). 

XV. Functional Assays 

MDDT function is assessed by expressing mddt at physiologically elevated levels in 
mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containing 
a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV 
SPORT (Life Technologies) and pCR3.1 (Invitrogen Corporation, Carlsbad CA), both of which 
contain the cytomegalovirus promoter. 5-10 fig of recombinant vector are transiently transfected into 
a human cell line, preferably of endothelial or hematopoie ' ^ origin, using either liposome 
formulations or electroporation. 1-2 \ig of an additional plasmid containing sequences encoding a 
marker protein are co-transfected. 

Expression of a marker protein provides a means to distinguish transfected cells from 
nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. 
Marker proteins of choice include, e.g.. Green Fluorescent Protein (GFP; CLONTECH), CD64, or a 
CD64-GFP fusion protein. Flow cytometry (FCM), an automated laser optics-based technique, is 
used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of 
the cells and other cellular properties. 
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FCM detects and quantifies the uptake of fluorescent molecules that diagnose events 
preceding or coincident with cell death. These events include changes in nuclear DNA content as 
measured by staining of DNA with propidium iodide; changes in cell size and granularity as measured 
by forward light scatter and 90 degree side light scatter; down-regulation of DNA synthesis as 
measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and 
intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma 
membrane composition as measured by the binding of fluorescein-conjugated Annexin V protein to 
the cell surface. Methods in flow cytometry are discussed in Ormerod, M. G. (1994) Flow 
Cvtometrv . Oxford, New York NY. 

The influence of MDDT on gene expression can be assessed using highly purified 
populations of cells transfected with sequences encoding MDDT and either CD64 or CD64-GFP. 
CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions 
of human immunoglobulin G (IgG). Transfected cells are efficiently separated from noniransfected 
cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL, Inc., 
Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill 
in the art. Expression of mRNA encoding MDDT and other genes of interest can be analyzed by 
northern analysis or microarray techniques. 

XVI. Production of Antibodies 

MDDT substantially purified using polyacrylamide gel electrophoresis (PAGE: see, e.g., 
Harrington, M.G. (1990) Methods EnzymoL 182:488-495), or other purification techniques, is used to 
immunize rabbits and to produce antibodies using standard protocols. 

Alternatively, the MDDT amino acid sequence is analyzed using LASERGENE software 
(DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is 
synthesized and used to raise antibodies by means known to those of skill in the art. Methods for 
selection of appropriate epitopes, such as those near the C-terminus or in hydrophiiic regions are well 
described in the iart. (See, e.g., Ausubel, 1995, supra . Chapter 1 1 .) 

Typically, peptides 15 residues in length are synthesized using an ABI 43 1 A peptide 
synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N- 
maieimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., 
Ausubel, supra .) Rabbits are immunized with the peptide-KLH complex in complete Freund's 
adjuvant. Resulting antisera are tested for antipeptide activity by, for example, binding the peptide to 
plastic, blocking with 1 % BSA, reacting with rabbit antisera, washing, and reacting with radio- 
iodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity 
using protocols well known in the art, including ELISA, RIA, and immunoblotting. 
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XVII. Purification of Naturally Occurring MDDT Using Specific Antibodies 

Naturally occurririg or recombmant MDDT is substantially purified bv immunoaffinitv 
chromatography using anybodies specific for MDDT. An .mtnunoaffinity colu™ .s constructed bv 
covalently coupHng ant.MDDT antibody to an activated chromatographic resin, such as 
CNBr-act,vated SEPHAROSE (An^ersham Pharmacia Biotech). After the coupling, the resin is 
bioclced and washed according to the manufacturers instructions. 

Medta containtng MDDT are passed over the immunoaffinity coluntn. and the column is 
washed under conditions that allow the preferential absorbance of MDDT (e.g., high ionic stren<nh 
buffers .n the presence of detergent). The column is eluted under conditions that disnipt 

antibody/MDDT binding (e.g., a buffer of dH to oH ^ o t,- u 

s V o . a Durrer of pH ^ to pH 3. or a high concentration of a chaotrope such 

as urea or thiocyanate ion), and MDDT is collected. 

All publications and patents mentioned ,n the above specification are herein incorporated bv 
re^rence. Various modifications and variations of the described method and system of the invention 
will be apparent to those skilled in the art without departing from the scope and spirit of the 
invention. Although the invention has been described in connection with specific preferred 
embodiments, it should be understood that the invention as claimed should not be unduly limited to 
such specific embodiments. Indeed, various modifications of the above-descnbed modes for carrying 
out the invention which are obvious to those skilled in the field of molecular b.ologv or related fie J 
are intended to be within the scope of the following claims. 
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TABLE 3 



ID NO: 


Template ID 


Start 


Stop 


Frame 


Domain Type 


1 


222197.6 


317 


406 


forward 2 


SP 


1 


222197.6 


901 


984 


forward 1 


TM 


2 


227709.3 


563 


649 


forward 2 


SP 


5 


243096.6 


3096 


3182 


forward 3 


SP 


6 


244366.6 


2801 


2878 


forward 2 


TM 


7 


405313.4 


2256 


2333 


forward 3 


TM 


7 


405313.4 


1503 


1589 


forward 3 


TM 
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TABLE 4 



SEQ ID NO: 



Template ID 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197,6 
222197.6 
222197.6 
222197.6 
222197.6 
222197,6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197,6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 
222197.6 



Component I 
3989355H1 
3989355R6 
91189739 
gl 123521 
341788dH2 
33989 16H1 
696738H1 
3387328H1 
3387328F6 
640954H1 
640954R1 
2674395H1 
4871937H1 
6014949H1 
1310167H1 
1310167F6 
3422058H1 
1429773H1 
1429773F6 
4725459H1 
2692245H1 
2692245F6 
2658283H1 
4402233H1 
673783H1 
487422H1 
3928678H1 
2641613F6 
2641613H1 
2770396H1 
2599469H1 
4861 15H1 
1 62661 5H1 
1 62661 5F6 
383522H1 
3355867H1 
3617236H1 
3510978H1 
1568105H1 
1571377H1 
3806389H1 
9774888 
2995341 HI 
5547807H1 
619375H1 
g 1962367 
2695323H1 
3142880H1 
3805357H1 
1962884H1 



Start 
1 
1 

58 

56 

105 

111 

228 

248 

248 

499 

499 

544 

609 

680 

729 

729 

733 

748 

748 

770 

774 

774 

818 

847 

847 

871 

898 

1019 

1019 

1027 

1040 

1 181 

1247 

1247 

1260 

1261 

1286 

130O 

1325- 

1325 

1368 

1369 

1382 

1412 

1501 

1501 

1517 

1618 

1518 

1538 



Stop 
122 
462 
533 
494 
341 
329 
480 
642 
705 
771 
841 
647 
809 
954 
952 
1153 
986 
1016 
1211 
889 
1025 
1300 
1051 
1083 
1089 
1123 
1175 
1494 
1259 
1273 
131 1 
1456 
1456 
1728 
1526 
1532 
1573 
1567 
1446 
1550 
1628 
1729 
1634 
1611 
1738 
1997 
1790 
1792 
1820 
1808 



56 



TABLE 4 



Template ID 


Component ID 


Start 


Stop 


222197.6 


1 S070S8F6 


1555 


2037 


222197.6 


1400677H1 


1628 


1894 


222197.6 


811652H1 


1650 


1954 


222197.6 


304811 OH 1 


1665 


1918 


222197.6 


30481 10F6 


1665 


1992 


222197,6 


3048102H1 


1665 


1965 


222197.6 


5059358H1 


1668 


1965 


222197.6 


2150138H1 


1671 


1925 


222197.6 


039720H1 


171 1 


1970 


222197.6 


2434795H1 


3030 


3132 


222197.6 


21 30701 HI 


3039 


3136 


222197.6 


2647674H1 


1740 


1841 


222197.6 


34489 15H1 


1810 


2065 


222197.6 


43772 15H1 


1841 


2043 


222197.6 


4317587H1 


1841 


1916 


222197.6 


1851036H1 


1841 


2033 


222197.6 


51 08761 HI 


1861 


2107 


222197.6 


2893266H1 


1873 


2139 


222197.6 


1568060H1 


1878 


2083 


222197.6 


1568004H1 


1878 


2097 


222197.6 


3531152H1 


1889 


2207 


222197.6 


2851378H1 


1922 


2260 


222197.6 


1931202H1 


1922 


2196 


222197.6 


2132091R6 


1936 


2208 


222197.6 


21 32091 HI 


1936 


2096 


222197.6 


5288578H 1 


1941 


2067 


222197.6 


g21 10737 


1944 


2226 


222197.6 


2207507H1 


1996 


2250 


222197.6 


2345452H1 


1996 


2256 


222197.6 


3414767H1 


2025 


2265 


222197.6 


13061 71 F6 


2042 


2378 


222197.6 


1306171H1 


2042 


2284 


222197.6 


4588038H1 


2066 


2349 


222197.6 


4587760H1 


2066 


2221 


222197.6 


1389765H1 


2108 


2367 


222.197.6 


gll37612 


2112 


2425 


222197.6 


266371 7H1 


2146 


2388 


222197.6 


332109OH1 


2152 


2435 


222197.6 


3840347H1 


2151 


2339 


222197.6 


2415559F6 


2175 


2618 


222197.6 


2415559H1 


2175 


2420 


222197.6 


3146224H1 


2178 


2430 


222197.6 


4201740H1 


2178 


2451 


222197.6 


3713261H1 


2194 


2446 


222197.6 


5900620H1 


2195 


2484 


222197.6 


1239238H1 


2205 


2358 


222197.6 


1965353R6 


2235 


2691 


222197.6 


1965353H1 


2235 


2500 


222197.6 


1471606H1 


2273 


2483 


222197.6 


3929839H1 


2278 


2578 
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PCT/USOO/15344 



TABLE 4 



Template ID 


Component ID 


Start 


Stop 


222197.6 


1521966H1 


2278 


2474 


222197.6 


1449385H1 


2291 


2533 


222197.6 


2381896H1 


2307 


2560 


222197.6 


2381895H1 


2307 


2559 


222197.6 


4643037H1 


2326 


2554 


222197.6 


3703243H1 


2351 


2650 


222197.6 


4295130H1 


2350 


2618 


222197.6 


4296184H1 


2350 


2589 


222197.6 


5841522H2 


2396 


2675 


222197.6 


3790395F6 


2413 


2974 


222197.6 


3811615H1 


2413 


2745 


222197.6 


2353349H1 


2414 


2513 


222197.6 


5698 17H1 


2413 


2660 


222197.6 


1621792H1 


2413 


2625 


222197.6 


520835H1 


2417 


2637 


222197.6 


g2161759 


2433 


2797 


222197.6 


1463438H1 


2433 


2622 


222197.6 


1621792T6 


2437 


3096 


222197.6 


347501 IHl 


2442 


2682 


222197.6 


1969764H1 


2447 


2686 


222197.6 


4188958H1 


2451 


2774 


222197.6 


2134836H1 


2458 


2578 


222197.6 


4054390H1 


2467 


2749 


222197.6 


5185060H1 


2468 


2694 


222197.6 


4058390H1 


2468 


2580 


222197.6 


4024007H1 


2469 


2784 


222197.6 


5597388H1 


2493 


2771 


222197.6 


3934968H1 


2512 


2787 


222197.6 


2770396T6 


2518 


3095 


222197.6 


993964H1 


2526 


2698 


222197.6 


1807088T6 


2531 


3099 


222197.6 


2132091T6 


2530 


3101 


222197.6 


1965395T6 


2532 


3098 


222197.6 


1805709H1 


2532 


2781 


222197.6 


4466288H1 


2537 


2803 


222197.6 


3020435H1 


2537 


2821 


222197.6 


g2355832 


2538 


3035 


222197.6 


1672661 HI 


2554 


2667 


222197.6 


1881147H1 


2554 


2807 


222197.6 


50983 16H1 


2573 


2856 


222197.6 


1429773T6 


2573 


3089 


1 V / .0 


1 6266 1 516 


2584 


3091 


222197.6 


1479854T6 


2587 


31 17 


222197.6 


3935053H1 


2598 


2897 


222197.6 


393091 8H1 


2598 


2915 


222197.6 


1654064H1 


2608 


2850 


222197.6 


2951 301 HI 


2619 


2908 


222197.6 


94223642 


2627 


3028 


222197.6 


2752320H1 


2628 


2928 


222197.6 


g2161260 


2634 


3031 
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PCT/USOO/15344 



)NO: 





TABLE 4 






Template ID 


Component ID 


Start 


Stop 


222197.6 


1306171T6 


2636 


3099 


222197.6 


3387328T6 


2640 


3092 


222197.6 


811652T6 


2639 


3097 


222197.6 


1959841H1 


2652 


2915 


222197.6 


1 95984 1T6 


2652 


3094 


222197.6 


1 95984 1R6 


2652 


3110 


222197.6 


94265714 


2655 


3143 


222\97.6 


94186863 


2662 


3139 


222197.6 


93249761 


2663 


3146 


222197.6 


94114970 


2663 




222197.6 


70508CM41 


2666 


2908 


222197.6 


2045743H1 


2673 


2971 

<C7/ 1 


222197.6 


2415559T6 


2675 




222197.6 


94394362 


2678 




222197.6 


92617967 


2680 




222197.6 


91548565 






222197.6 


3790395T6 


2684 


ion 


222197.6 


a4393109 




O 1 oO 


222197.6 


1 91 5761 H 7 


9AAO 




222197.6 


q2 153774 


97CO 


o loo 


222197.6 


996350H1 




^VoU 


222197.6 


996350R1 


2729 

*— / 4_,7 




222197.6 


996350T1 


2729 




222197.6 


997484H1 


2731 




222197.6 


4801 76T6 


2736 


9O0n 


222197.6 


4801 76R6 


2736 




222197.6 


912187H1 


2744 




222197.6 


1313558H1 


2744 




222197.6 


9656198 


2763 




222197.6 


5057011 HI 


2784 


snap 

WWtl7 


222197.6 


2260766H1 


2792 


3069 


222197.6 


93737413 


2803 


3139 


222197.6 


1686080H1 


2810 




222197.6 


9821623 


2838 


3148 


222197.6 


2641613T6 


2833 


3094 


222197.6 


2328044 HI * 


2845 


3113 


222197.6 


94371777 


2859 


3141 


222197.6 


91516072 


2860 


3144 


222197.6 


92433045 


2865 


3091 


222197.6 


2648474H1 


2866 


3123 


222197.6 


2434653H1 


2871 


3069 


222197.6 


1878079H1 


2881 


3147 


222197.6 


94109641 


2896 


3139 


222197.6 


2428514H1 


2909 


3098 


222197.6 


4146636H1 


2909 


3172 


222197.6 


4703a38Hl 


2929 


3139 


222197.6 


3125420H1 


2934 


3139 


222197.6 


5942277H1 


2971 


3137 


227709.3 


783646H1 


1577 


1867 


227709.3 


231421 IHl 


1585 


1834 
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wo 00/75298 





ohW ID inU! 


Template ID 




z 


22/709.3 




o 
Z 


22//09.O 




Z 


227709.3 




o 


22/ /u9.o 




o 

Z 


227709.3 






227709.3 




z 


227709.3 




2. 


227709.3 






227709.3 




2 


227709.3 






227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 


i " 

A ^ 


2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 


I" S" 


2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


22//09.3 




2 


227709.3 




2 


22/ 709. 3 






22/709.3 




Z. 


22/ /Uv.o 




o 

£. 


227709,3 




2 


ZZ/ /uv.o 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 




2 


227709.3 



PCT/USOO/15344 



TARi F 4 






Component ID 


Start 


Stop 


342368H1 


1595 


1832 


1 83324 1R6 


1595 


2004 


•1833241 HI 


1595 


1853 


154n7Hl 


1609 


1752 


4938186H1 


1617 


1905 


4912871 HI 


1628 


1918 


2243937 HI 


1653 


1871 


876350H1 


1530 


1682 


4959782H1 


1532 


1785 


1839309H1 


1663 


1974 


1378304H1 


1695 


1940 


1660925H1 


1696 


1936 


1 89491 7H1 


1696 


1912 


23949 14H1 


1716 


1815 


3035674H1 


1714 


2034 


808578H1 


1714 


2019 


30351 36H1 


1714 


2035 


808578R1 


1714 


2351 


3852902H1 


1716 


1984 


3 122052 HI 


1729 


2075 


610759OH1 


1729 


2046 


5925422 HI 


1735 


2039 


4378762H1 


1746 


2059 


378n68Hl 


1763 


1958 


2469542H1 


1763 


2006 


4585384H1 


1765 


2071 


37721 59H1 


1772 


2087 


1436901 F6 


1787 


2216 


1436902H1 


1787 


2081 


1436902F1 


1787 


2417 


732765H1 


1796 


2043 


531886H1 


1796 


2064 


732765R1 


1796 


2359 


323492H1 


1796 


2072 


1755142H1 


1811 


2072 


2088594 HI 


1817 


2083 


1531927H1 


1820 


2036 


1281 21 OH 1 


1828 


1963 


618185H1 


1834 


2133 


072066H1 


1833 


2072 


920524H1 


1837 


2173 


g2030053 


1840 


2258 


g681548 


1845 


2248 


g 1190789 


1847 


2173 


3052574H1 


1853 


2158 


4546684H1 


1856 


1963 


g 1846206 


1855 


2184 


1231442H1 


1856 


2170 


4546692H1 


1856 


1961 


1231220H1 


1856 


2108 
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TABLE 4 



ir) MO- 
IL-/ iNNw/. 








Stor> 




227709 3 




1861 


2153 


0 


227709.3 


2040451 HI 


1861 


2194 


2 


227709.3 


5781387H1 


1861 


2129 


2 


227709.3 


wwwW i ^v./( 1 1 


1861 


2181 


2 


227709.3 


3479633H1 


1859 


2214 


2 


227709.3 


2196589H1 


1861 


2123 


2 


227709.3 


3a72090Hl 


1867 


2086 


o 

A. 


227700 3 


1210943P1 


1867 


2217 




9977 QQ 3 




1867 


2104 




227709 3 


3 1 20279H 1 


1867 


2163 


o 


227709 3 


1 Z. 1 VsJ 7*-t>_/ 1 1 1 


1867 


21 97 


9 


997709 3 


^A770'^9hn 


1869 


91 '^3 

tC, 1 oo 


9 


997700 3 


^J*4ov*40wn I 


1 A7S 


91 77 
^ 1 / / 


9 


997700 3 


n30331M1 
uouoo 1 n 1 


1 A77 
1 o / / 


91 SI 

^ t O 1 


9 


9977O0 3 


y 1 7 / uu*-io 


1 o / o 


9903 


9 


907700 3 


99^7019W1 

zzo/u i zn 1 


t ooo 


91 '^O 
Z I oV 


9 


997700 3 


/UOVOZrl 1 


1 O 7 I 


9900 
zzuv 


9 


9977O0 3 
zz / / uv . o 


zoo^z-^vn 1 


1 OV I 


9909 
zzuz 


9 


997700 3 
ZZ/ / uv.o 


^7 1 99^^AM 1 


1 vuz 


9999 


9 


997700 3 

ZZ/ /uv.o 


o/vozovri 1 


1 003 
1 VUO 


991/1 

ZZ 


9 


9977O0 3 
ZZ/ /uv.o 


MQovozn 1 


1 VUO 


91 71 

Z 1 / 1 


9 


997700 3 
ZZ/ / uv.o 


ovo*4uon 1 


lOlO 

1 V 1 u 


99 lO 


9 


9977O0 3 
ZZ/ / uv.o 


9i403*v!inWl 


lOl ^ 

1 7 1 O 


9917 


9 


997700 3 

^^/ / L^V .O 


tJL/vo 1 o/ ri t 


lOl A 

1 7 1 O 


9917 


9 


997700 3 


^9S7073H1 


1098 

1 7 




9 


09110Q 3 

iL^I / WV .O 


W/ ^M7»-lf i 1 


1Q36 


91 78 


9 


997709 3 




1936 

1 70vJ 


9946 


9 


297709 3 


Q7'^A0AH1 


1936 


9917 


9 


997709 3 




1Q37 

1 7 0/ 


9917 


9 


997709 3 


u / \j 7 7 1 n 1 


1936 


9997 


9 


997709 3 


w/ vJOzkjn 1 


1936 


91 19 

^1 IT 


9 


227709 3 




1936 


9162 


9 


997709 3 




1938 

\ 700 


9917 


2 


227709 3 




1938 


2217 


2 


0211 


5881208H1 

www 1 ^W^l 1 1 


1939 


2217 


2 


227709.3 


58885 19H1 


1939 


2212 


2 


227709.3 


589021 8H1 


1939 


2212 


2 


227709.3 


4783188H1 


1938 


2225 


2 




2317709H1 


1941 


2222 


2 


227109.3 


1876721 HI 


1952 


2217 


o 
^ 


997709 3 

£.£^1 / CJV .o 


94A03'^'^W1 


1954 


999*^ 


2 


227709.3 


734056H1 


1953 


2076 


2 


227709.3 


2938267H1 


1957 


2217 


2 


227709.3 


3166569H1 


1957 


2217 


2 


227709.3 


4591368H1 


1966 


2227 


2 


227709.3 


2397855H1 


1966 


2240 


2 


227709.3 


6105204H1 


1965 


2217 


2 


227709.3 


874849H1 


1968 


2217 


2 


227709.3 


4458852H1 


1967 


2217 


2 


227709.3 


874849R1 


1968 


2621 
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wo 00/75298 



PCT/US00/1S344 



TABLE 4 



ID NO; 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


1896 1 ooHl 


1989 


2217 


2 


227709, J 


247Uo72T6 


1 988 


2644 


2 


227709.3 


lo3l904H 1 


1994 


2217 


2 


227709.3 


2433o78H I 


2003 


21 73 


2 


227709.3 


4893472H 1 


2003 


2337 


2 


227709.3 


2395264H1 


2016 


2217 


2 


227709.3 


4203434H 1 


2018 


2350 


2 


227709.3 


4886606H1 


2025 


2329 


2 


227709.3 


4886606F6 


2025 


2094 


2 


227709.3 


504991 HI 


2036 


2217 


2 


227709.3 


31 23928 HI 


2035 


2317 


2 


227709.3 


ft ^ft.^^^9 ft y 

372978R6 


2539 


2691 


2 


227709.3 


ft ft r* ✓ ^ y ft 1 • ^ 

2356168H1 


2576 


2698 


2 


227709.3 


213804H1 


2632 


2687 


2 


227709.3 


3347292H1 


461 


695 


2 


227709.3 


5013953H1 


518 


793 


2 


227709.3 


4327103H1 


527 


781 


2 


227709.3 


3295045H1 


569 


81 1 


2 


227709.3 


« % ft ft ft 1 *l 

4012389H1 


621 


883 


2 


227709.3 


1377181F1 


637 


1043 


2 


227709.3 


1377181H1 


637 


880 


2 


227709.3 


58839 12H1 


663 


856 


2 


227709.3 


5886832H1 


663 


927 


2 


227709.3 


5881250H1 


664 


943 


2 


227709.3 


377379H1 


687 


946 


2 


227709.3 


2098327H1 


697 


942 


2 


227709.3 


35n206H1 


728 


873 


2 


227709.3 


4031927H1 


736 


993 


2 


227709.3 


2448606H 1 


738 


977 


2 


227709.3 


ft >< ft ft *t* / 

2473593T6 


743 


1332 


2 


227709.3 


2444327H1 


744 


975 


2 


227709.3 


388684H1 


774 


1038 


2 


227709.3 


^^^^ 4 rr ^> ft t 

924593 Rl 


778 


1 154 


2 


227709.3 


924593H1 


778 


1044 


2 


227709.3 


2707537T6 


791 


1332 


2 


227709.3 


I842323T6 


803 


1 331 


2 


227709,3 


1 ft ft ^ ^111 

1 386726H 1 


813 


1 109 


2 


227709.3 


1436357 HI 


816 


1076 


2 


22/709,3 


I436357F 1 


O 1 z. 

816 


1379 


2 


227709.3 


1842323H1 


818 


1009 


2 


227709.3 


1 842323R6 


818 


1347 


2 


227709.3 


2757452H1 


862 


1137 


2 


227709.3 


33891 7H1 


874 


1099 


2 


227709.3 


g 1382744 


934 


1319 


2 


227709.3 


g2880866 


952 


1325 


2 


227709.3 


5289932H1 


954 


1212 


2 


227709.3 


736987R6 


967 


1219 


2 


227709.3 


92955000 


966 


1369 


2 


227709.3 


736987H1 


967 


1187 


2 


227709.3 


4792654H1 


971 


1250 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


270359H1 


989 


1337 


2 


227709.3 


2532882H1 


985 


1263 


2 


227709.3 


3246655H 1 


990 


1254 


2 


227709.3 


g2 106694 


1001 


1373 


2 


227709.3 


6210707H1 


1069 


1397 


2 


227709.3 


340933H1 


1086 


1262 


2 


227709.3 


46389 14H1 


1094 


1344 


2 


227709.3 


621 1794H1 


1106 


1397 


2 


227709.3 


1 220278H 1 


1 148 


1408 


2 


227709.3 


47629 16H1 


1193 


1486 


2 


227709.3 


6208765H 1 


1202 


1504 


2 


227709.3 


3167466H1 


1208 


1497 


2 


227709.3 


454 1 633H 1 


1218 


1466 


2 


227709.3 


1 389793H 1 


1 


242 


2 


227709.3 


1221361H1 


146 


324 


2 


227709.3 


6210207H1 


1217 


1524 


2 


227709.3 


6208587H 1 


1217 


1448 


2 


227709.3 


432n63H 1 


149 


454 


2 


227709.3 


2197686H1 


1241 


1389 


2 


227709.3 


3871923H1 


142 


426 


2 


227709.3 


2473593H 1 


205 


438 


2 


227709.3 


014Q86H1 


1244 


1539 


2 


227709.3 


a 1809628 


1244 


1573 


2 


227709.3 




1274 


1510 


2 


227709.3 


013903H1 


1285 


1535 


2 


227709.3 


3159468H1 


1288 


1590 


2 


227709.3 


5219130H1 


1303 


1575 


2 


227709.3 


2473593F6 


205 


346 


2 


227709 3 


862771 HI 


1304 


1579 


2 


227709.3 


336851 OH 1 


1323 


1609 


2 


227709.3 


2473008H1 


1340 


1591 


2 


227709.3 


3633133H1 


1353 


1666 


2 


227709.3 


2470872F6 


267 


453 


2 


227709.3 


2472485H1 


1367 


1615 


2 


227709.3 


3940725H1 


1371 


1561 


2 


227709.3 


1839612H1 


1376 


1668 


2 


227709.3 


1839635H1 


1376 


1702 


2 


227709.3 


2440859H1 


1388 


1640 


2 


227709.3 


2560406H1 


1389 


1677 


2 


227709.3 




141 1 


1595 


2 


227709 3 




1432 


1807 


2 


227709.3 


4753851 HI 


1460 


1734 


2 


227709.3 


855862R1 


1460 


2074 


2 


227709.3 


855862H1 


1460 


1682 


2 


227709.3 


5436709H1 


1470 


1708 


2 


227709.3 


3872292H1 


1474 


1684 


2 


227709.3 


2470872H1 


267 


518 


2 


227709.3 


4738304H2 


363 


616 


2 


227709.3 


63461 3H1 


376 


613 


2 


227709.3 


489031 OH 1 


1474 


1746 
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TABLE 4 



ID NO: 






OIUl ^ 


OiUf-) 


2 


227709.3 


~*^*-+vJO<or i 1 


0 7VO 


uoo 


2 


227709 


nl '^O'^Sl 8 

^ 1 >.^wo»o 1 \_> 


1 •-»oo 


201 1 


o 




*+LJO 1 70Urj 1 


1 407 


1 AO^ 
» OVO 


o 


227709 


z / 0 IV/ ori 1 


1 

1 OU 1 


1 / mO 


o 


227709 


7L>*-rOUUI\ 1 


ovo 


000 
vvv 


o 


227709 


MV0*4^0Zri 1 


ovo 


A7/1 


o 




^'^9'^'^n^^H 1 

^ozoouun 1 


407 


AO! 

OV 1 


9 




000000 n I 


^0 1 


AO A 
OVO 


o 




ooooovn 1 


*40 1 


79 A 
/ ZO 


9 


9977nO 


/n77^A'^7 


i 0 1 Z 


1 0 1 0 


9 


9077nO 


yo/ ouo/ 


[ 0 i Z 


1 04U 


9 


997700 


'^O'^A^l OWl 
ouoo*4 1 vn 1 


1 0 1 


1 ouu 


9 


997700 


zoouv 1 1 


1010 


1 /oo 


9 


997700 


c;90A07AW9 

ozuov / On z 


i OoU 


1 OUO 


9 


997700 


U/OOoO 1 0 


ZUoV 


ZOoU 




007700 "5 


QO 1 /OOo/ 


ZUo4 


001 "7 
ZZ 1 / 


9 


997700 

zz/ /uy.o 


AAOOft7Tl 
OOZUo/ 1 1 


zuoo 


OA/ICS 

Z04V 


9 


997700 
ZZ/ /UV.O 


OOZUO /ri 1 


zuoo 


Zo4^ 


9 


997700 
ZZ/ /UV.O 


Q 1 VODOOA 


zuoo 


Z4Z 1 


9 


997700 
ZZ/ / UV.O 


9/lAOfi07TA 

z^ovoz/ 10 


zuoo 


Z040 


9 


997700 7 
ZZ/ / UV.O 


zovov/ori 1 


zuoo 


on Q 
ZO 1 0 


9 


997700 
ZZ/ /UV.O 


1 oooz^ 1 10 


ZU/4 


Z04o 


9 


997700 
ZZ / / UV.O 


1 90 A Ano W 1 

1 ZZOOUVrt 1 


ZU/D 


ZoOo 


9 


997700 
ZZ / /UV.O 


1 /I AOn 1 TA 
1 -^oOVU 1 jO 


ZU/D 


OA/1 1 

Z04 1 


9 


997700 
ZZ / / UV.O 


9A97 1 C.'^W 1 
ZOZ/ 1 oOri 1 


90RA 

zuoo 


Oyl/tO 
Z^^V 


9 


997700 
ZZ/ / UV.O 


1 A7AOAnWl 
1 0/ ovouri 1 


90AA 
ZUOO 


OA7 /I 
Zo/^ 


9 


997700 
ZZ/ /UV.O 


Z<^/oZ^U !0 


ZUOO 


ZO^O 


9 


997700 

ZZ/ /UV.O 


z*4o 1 oovn i 


90A7 
ZUO/ 


ZOZV 


9 


997700 

ZZ/ /UV.O 


9i40nA9^l-!l 

ZMUUOZmPI I 


90A7 
ZUO/ 


9A*^R 
ZOOO 


9 


997700 

ZZ / / UV.O 


1 A70AAAPA 
1 0/ voooro 


ZUVO 


ZOoU 


9 


997700 

ZZ / / UV.O 


1 A70AAAW 1 
1 0/ vooon 1 


900^ 
ZUtO 


9 A AO 
ZOOV 


9 


22770Q 

ZZ/ / UV.O 


1 A70AAATA 
1 0/ vooo 10 


900A 
ZUVO 


9A^A 
ZOOO 


9 


997709 

/ vO 7 .0 


9lQS49^H1 
z [ 7\0*4zuri 1 


900A 

ZUVO 


9^ on 


9 


997709 

ZZ/ / UV.O 


n 1 0 A9A 1 A 
y 1 vuzo 1 0 


91 09 
z 1 uz 


ZOVZt 


9 


227709 S 


2'^2'Sft47H1 


21 19 
z 1 1 z 


9A74 

ZO/^ 


9 


227709 


*+*-noQJwoun 1 


91 19 
z 1 1 z 


9^1/1 


2 


227709.3 


394072516 


21 IR 


9AS4 

ZUO<4 


9 


/ / W 7 .0 


9917S'^9H1 

<uV 1 / UOTr n 1 


91 lO 
z 1 1 V 


941 A 

Z*4 1 0 


2 


227709.3 


1 0**/ lovjon 1 


91 "^O 
z i ou 


9*^71 

ZO / 1 


9 


227709 

^£.1 / UV lO 


^^4SAAW1 
ouMuuon 1 


91 Xl 


9'^A7 

ZOO/ 


9 


£.£.1 / ^7 .0 


'^9'^2l'^l HI 
ozo*4o 1 n 1 


Z 1 -^O 


9449 


2 


227709.3 


450205H1 


2150 


2378 


2 


227709.3 


1448863H1 


2150 


2423 


2 


227709.3 


24721 38T6 


2186 


2646 


2 


227709.3 


736987T6 


2206 


2648 


2 


227709.3 


93932020 


2237 


2687 


2 


227709.3 


1676401 HI 


2247 


2475 


2 


227709.3 


406441 HI 


2259 


2522 


2 


227709.3 


334657H1 


2259 


2512 


2 


227709.3 


5888253H1 


2261 


2496 
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SEQ ID NO; 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 

5 2 

2 

Jl 2 

2 
2 
2 
2 

U 2 

2 
2 
2 

2 

il 2 

2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 



TABLE 4 



Template ID 


Component ID 


Start 


Stop 


227709.3 


310243OH1 


2261 


2530 


227709.3 


6106355H1 


2261 


2524 


227709.3 


93182016 


2261 


2693 


227709.3 


604792H1 


2261 


2486 


227709.3 


2428425H1 


2261 


2433 


227709.3 


5883372H1 


2261 


2444 


227709.3 


94264131 


2262 


2702 


227709.3 


633186H1 


2262 


2544 


227709.3 


3993266H1 


2261 


2540 


227709.3 


201 7401 HI 


2263 


2427 


227709.3 


2424057H1 


2268 


2536 


227709.3 


404680H1 


2273 


2496 


227709.3 


3153874H1 


2288 


2593 


227709.3 


94291477 


2288 


2694 


227709.3 


4061504H1 


2288 


2546 


227709.3 


92968506 


2289 


2694 


227709.3 


3123730H1 


2289 


2606 


227709.3 


3123955H1 


2289 


2589 


227709.3 


g41 15080 


2293 


2694 


227709.3 


308633H1 


2295 


2534 


227709.3 


2397634H1 


2297 


2495 


227709.3 


94332177 


2302 


2694 


227709.3 


94267824 


2303 


2694 


227709.3 


308633F1 


2303 


2687 


227709.3 


308633R1 


2303 


2687 


227709.3 


2371681 HI 


2309 


2551 


227709.3 


2421774H1 


2309 


2547 


227709.3 


91810042 


2311 


2687 


227709.3 


93958397 


2313 


2677 


227709.3 


4695290H1 


2316 


2582 


227709.3 


2756565H1 


2320 


2630 


227709.3 


9612404 


2323 


2687 


227709.3 


2445854H1 


2327 


2587 


227709.3 


449703H1 


2332 


2458 


227709.3 


2877181H1 


2334 


2622 


227709.3 


1211271R1 


2334 


2687 


227709.3 


1211271T1 


2334 


2649 


227709.3 


12n271Hl 


2334 


2608 


227709.3 


9616409 


2341 


2660 


227709.3 


2017458H1 


2340 


2619 


227709,3 


1534375H1 


2342 


2572 


227709.3 


3145020H1 


2344 


2683 


227709.3 


1539734H1 


2358 


2600 


227709.3 


37861 74H1 


2363 


2661 


227709.3 


1464741 Fl 


2367 


2687 


227709.3 


1454741 HI 


2367 


2632 


227709.3 


271 7951 HI 


2372 


2548 


227709.3 


1359816H1 


2374 


2618 


227709.3 


1 35981 6F1 


2374 


2694 


227709.3 


9564645 


2385 


2694 
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SEQ ID NO: 


Template ID 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


2 


227709.3 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 


3 


237703.2 



PCT/USOO/15344 



TABLE 4 



Component ID 


Starr 


stop 


gll54316 


2388 


2698 


9891561 


2393 


2707 


24733 12H1 


2393 


2648 


92992779 


2394 


2643 


1218580T6 


2395 


2649 


1218580T1 


2395 


2649 


9824343 


2396 


2719 


1218580R6 


2395 


2691 


1218573H1 


2395 


2641 


286961 HI 


2403 


2690 


4367190H1 


2408 


2684 


23991 22H1 


2420 


2672 


5882896H1 


2423 


2687 


9967282 


2422 


2700 


5883904H1 


2423 


2687 


5883908H1 


2423 


2687 


5882375H1 


2424 


2567 


gl 191305 


2429 


2710 


794290H1 


2437 


2664 


520742H1 


2442 


2678 


9646174 


2443 


2687 


1538631 HI 


2443 


2651 


1722285H] 


2444 


2677 


gl202716 


2469 


2701 


862903T1 


2482 


2648 


095638H1 


2484 


2694 


862903R1 


2484 


2694 


3124846H1 


2490 


2692 


59064 70H1 


2497 


2691 


2535162H1 


2525 


2651 


4460277H1 


2533 


2694 


372978T6 


2539 


2649 


372978H1 


2539 


2686 


91963754 


1 


374 


91137733 


95 


407 


9843567 


124 


414 


3070350H1 


189 


477 


3070350F6 


189 


709 


1439542H1 


413 


686 


3203352H1 


437 


711 


92013304 


598 


954 


3799002H1 


701 


1010 


044160T6 


958 


1453 


824258H1 


1020 


1254 


gl8943<;^ 


1031 


1472 


3491432H1 


1052 


1315 


2601554F6 


1059 


1602 


2601554H1 


1060 


1336 


4617816H1 


1073 


1337 


40581 86H1 


iin 


1197 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


3 


237703.2 


5554248H1 


1 145 


1355 


3 


237703.2 


5554148H1 


1 145 


1386 


3 


237703.2 


g3594366 


1 199 


1611 


3 


237703.2 


5122547H1 


1278 


1516 


3 


237703.2 


2278690H1 


1379 


1654 


3 


237703.2 


2278690R6 


1379 


1879 


3 


237703.2 


6564683H1 


1402 


1658 


3 


237703.2 


95051 9H1 


1436 


1678 


3 


237703.2 


9505 19R6 


1436 


1723 


3 


237703.2 


91123529 


1437 


1805 


3 


237703.2 


5260937H1 


1514 


1744 


3 


237703.2 


3386649H1 


1527 


1744 


3 


237703.2 


5272970H1 


1532 


1780 


3 


237703.2 


3808872H1 


1533 


1838 


3 


237703.2 


9505 19T6 


1554 


2019 


3 


237703.2 


319252H1 


1592 


1986 


3 


237703.2 


4411457H1 


1669 


1947 


3 


237703.2 


g 1933240 


1669 


2147 


3 


237703.2 


2601554T6 


1698 


2313 


3 


237703.2 


824257T6 


1699 


2313 


3 


237703.2 


28591 38T6 


1776 


2312 


3 


237703.2 


1872409F6 


1788 


2150 


3 


237703.2 


1872409H1 


1788 


2062 


3 


237703.2 


1572418H1 


1798 


1997 


3 


237703.2 


1872409T6 


1805 


2312 


3 


237703.2 


530381 HI 


1810 


1963 


3 


237703.2 


5583255H1 


1811 


2075 


3 


237703.2 


126942H1 


1816 


2025 


3 


237703.2 


2278690T6 


1829 


231 1 


3 


237703.2 


1211984H1 


1832 


2066 


3 


237703.2 


2703527H1 


1832 


2106 


3 


237703.2 


3253906H1 


1886 


2159 


3 


237703.2 


93934221 


1896 


2349 


3 


237703.2 


1620273H1 


1897 


2117 


3 


237703.2 


92819399 


1908 


2351 


3 


237703.2 


93895924 


1936 


2349 


3 


237703.2 


92881 190 


1962 


2270 


3 


237703.2 


040587H1 


1971 


2158 


3 


237703.2 


93147053 


1973 


2349 


3 


237703.2 


9843522 


2009 


2349 


3 


237703.2 


g 1844904 


2023 


2349 


3 


237703.2 


92881790 


2024 


2349 


3 


237703.2 


92820075 


2030 


2349 


3 


237703.2 


92237723 


2051 


2350 


3 


237703.2 


2385 12H1 


2085 


2313 


3 


237703.2 


292954H1 


2182 


2320 


3 


237703.2 


92013921 


2238 


2522 


3 


237703.2 


91980268 


2386 


2742 


4 


240091.1 


2898155H1 


1 


289 


4 


240091.1 


2434264H1 


3 


215 
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TABLE 4 



SEQ ID NO: 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 



Template ID 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091,1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091 . 1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 
240091,1 
240091.1 
240091.1 
240091.1 
240091.1 
240091.1 



Component ID 
5075278H1 
2647594H1 
4785026H1 
4785001 HI 
3600323H1 
244821 8F6 
24482 18H1 
1391961T6 
083349H1 
070643H1 
3712971T6 
3399693H1 
93000643 
93659260 

3491102H1 

2286846H1 

45751 30H1 

2584803H1 

2584803F6 

4399541 HI 
489490H1 

5541073H1 
797486H1 

2435453H1 

2434264R6 

5279 14H1 

4382936H1 

4210908H1 

3615238H1 

3615238F6 

2733107H1 

494380H1 

1391 961 F6 

1391961H1 

5801 12H1 

1232706F6 

1232706H1 

3487133H1 

94244249 

4476 19H1 
57822 14H1 
4913546F6 
4913546H1 
38921 11 HI 
4742244H1 

g 1484624 
2376485F6 
2376485H1 
2376485T6 
1849607H1 



Start 
3 

1720 
1803 
1803 
1258 
1267 
1267 
1267 
1309 
1309 
1317 
1384 
1400 
1446 
1447 
1496 
1513 
1539 
1539 
1590 
1599 
1605 
1609 
3 
3 
4 
10 
20 
47 
47 
54 
62 
66 
66 

359 

389 

389 

432 

511 

580 

670 

830 

830 

834 

836 

867 

901 

901 

902 

907 



Stop 
127 
1770 
2073 
2069 
1557 
1485 
1508 
1730 
1464 
1543 
1744 
1606 
1562 
1772 
1559 
1696 
1756 
1770 
1770 
1833 
1844 
1804 
1772 
231 
491 
275 
241 
292 
340 
528 
275 
307 
475 
318 
558 
842 
629 
698 
981 
799 
964 
1249 
1108 
1130 
1102 
1316 
1205 
1124 
1167 
1198 
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^ ID NO: 


T^rnr^lr*i+o tPt 


+ irN 

k-'OrnponGnT il? 


Start 


SfOD 


4 




4uouyy^n 1 


952 


1198 


4 


94nnoi 1 


07A/1QQ ALJ 1 


975 


1201 


4 




/lAA^OOALJ 1 
*tOOOzUoH 1 


1 138 


1401 


4 




1 OqO"70ATA 


i 176 


1729 


4 


94nnoi 1 


O/rq/OA/ITA 

■<i4o4zo4 J O 


1210 


1746 


4 


^*4KJ\J^ 1 . 1 


i^ytOj^QI OLJ1 

44^4>3 1 on 1 


1223 


1459 


5 






1006 


1482 


5 




Q ^^n"7 Die 

gooy/olo 


1016 


1479 


5 


9^*^nOA A 
-^MOLTr O.O 


goy33703 


1018 


1479 


5 


9/1*^nOA A 


g4o 72582 


1022 


1487 




O/l'^nOA A 


4 1 uou7 1 H 1 


1030 


1104 




OA "^nOA A 


o47oo26Hl 


1032 


1183 




O/lonOA A 


04457447 


1045 


1481 


w 


O/TQnOA A 


g4222324 


1075 


1480 


w 




63 I209R6 


1076 


1478 


o 




g3 1 18595 


1075 


1478 


f; 
o 


O/IQnOA A 


g2577165 


1076 


1480 




O/IOnOA A 


g2 185952 


1078 


1493 


c: 


O/l "^nOA A 

^4ouyo.o 


g 2063697 


1079 


1482 




O/1'^noA A 


222u52H 1 


1079 


1215 




o/'^noA A 


222052F 1 


1078 


1479 




0/1 "^noA A 


222052k 1 


1078 


1479 


o 


OvIQriOA A 


g3737532 


1083 


1509 




Oy^OnOA A 


1 O Ar^~I^ AT/ 

1 849724T6 


1095 


1440 


^^ 


Oyl'^nOA A 


g4l 10131 


1 1 15 


1481 


5 


o>iqnOA A 


o3 1 209T6 


1 1 16 


1438 




0/iqnOA A 


2446727T6 


1 1 19 


1438 




O/ '^nOA A 


g3l 55321 


1 129 


1475 




Oyl'anOA A 


go I /ol 76 


1 148 


1494 




O/qnOA A 


y4o 1 /7H 1 


1 149 


1428 




O/qnOA A 


y4o 1 //R 1 


1 149 


1488 


5 


94? no A A 


} y4^ 1 /on 1 


1 163 


1441 


5 


9^1 '^nOA A 


1 0/10 1 7ADA 

I y4^: 1 /OKO 


1 163 


1418 


5 


9^4*^00 A A 

^ *4 O W 7 1_/ . U 


1 y4^ J Oon I 


1 163 


1440 


5 


94*^00 A A 


0004 I oo n 1 


1 164 


1433 


5 


94*^00 A A 


(4 1 oo 1 4 10 


1 209 


1430 


5 


O/'^nOA A 


1/11 ARl AU 1 

14 1 oo I4n 1 


1216 


1431 


5 


94'^nQA A 


14 J oo04n 1 


1216 


1468 


5 


o/qnQA A 


14 1 OO l4rO 


i216 


1479 


5 


O/qnOA A 


0o^o4on 1 


1218 


1458 


5 


243096 6 


n4in771 1 


1 oon 


1526 


5 


243096.6 


g4457962 


1227 


1483 


5 


243096.6 


g2837785 


1228 


1479 


5 


243096.6 


g819991 


1238 


1496 


5 


243096.6 


g564440 


1237 


1488 


5 


243096.6 


g816379 


1251 


1640 


5 


243096.6 


g885380 


1252 


1488 


5 


243096.6 


g 768804 


1261 


1481 


5 


243096.6 


6093263H1 


1263 


1492 


5 


243096.6 


g645318 


1286 


1488 
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SEQ ID NO: 


Template ID 


Component ID 


Start 


OTOp 


5 


243096.6 


g566867 


1292 


1 ozo 


5 


243096.6 


q81631 ] 


1302 


1 A70 
! O/ V 


5 


243096.6 


g671079 


1296 


» *400 


5 


243096.6 


a22 19072 


1 2Q7 




5 


243096.6 


g 2539665 


1 300 

1 wCw 


1 y170 


5 


243096.6 


g670466 


1300 


1 OZO 


5 


243096.6 


2474482H1 


i W 1 W 


1 04z 


5 


243096.6 


□4328047 


1 w J *4 




5 


243096.6 


02205935 


1 wOw 




5 


243096.6 


0832021 


) wU/ 


i O/o 


5 


243096.6 


a22059v^6 


1 w/ \-> 




5 


243096.6 


a2789365 


\ ovo 


14/y 


5 


243096.6 




1 /II A 

1 *4 1 O 


1 54L) 


5 


243096.6 


0900098 

/ www 7 W 


141Q 




5 


243096.6 


a5676'^9 

y WW / *.Jw ~ 


1 *4Z*4 


1 OO/ 


5 


243096.6 


1918362H1 


J ^ / w 


1 /o 1 


5 


243096.6 


472761 1 HI 


1 v^/O 


1 004 


5 


243096,6 




I O*X0 


1 (~i7Z. 

ly/o 


5 


243096.6 




1 oo 1 


zU I z 


5 


243096,6 


y vjwljt I o 


1 A^/l 

1 ocw 


zU I 2. 


6 


243096.6 


141461 9H1 


1 OoZ 


1 oVU 


5 


243096.6 


4761 9snf-n 

/ J ^wwn I 


1 AAO 


1 VoV 


5 


243096.6 




T AA'\ 
1 OOO 


ly/z 


5 


243096.6 


0561207 

WW 1 ^w / 


T AA^ 


t voo 


5 


243096,6 


02002379 


1 AA*N 
\ OOO 


zUUz 


5 


243096,6 


0709471 


1 AA^ 
1 OOO 


1 OOO 


5 


243096.6 


4761242H1 


1 AAi^ 


1 0*30 

1 yov 


5 


243096.6 


0518391 

V^W 1 WW 7 1 


1 A7A 
1 O/ o 


i yoo 


5 


243096.6 


4595686H 1 

"^w^wwwwr 1 1 


1 f\j/ 


1 yo 1 


5 


243096.6 


151 1493H1 


1 709 


(yyo 


5 


243096.6 


151 149^FtS 


1 700 




5 


243096.6 


1512376H1 

1 W 1 ^W t \Jl 1 I 


1 709 
1 / vz 




5 


243096.6 


02003356 

^ www www 


1 Q99 


9nA7 


5 


243096.6 


4697285H1 




OA/1A 
Z*+*40 


5 


243096.6 


4941432H1 


2'^ftl 


ZO/ O 


5 


243096.6 


1230891 HI 




ZOL/O 


5 


243096.6 


1522037H1 


2421 


ZU^w 


5 


243096.6 


3749969H1 


2502 


97PQ 


5 


243096.6 


2125142H1 


2S27 


Z/ Vw 


5 


243096.6 


2125142F6 




9R/1 1 
ZOA 1 


5 


243096.6 


121562H1 


2620 


2806 


5 


243096.6 


856668H1 


2745 


2933 


5 


243096.6 


5882521 HI 


2758 


3030 


5 


243096.6 


5888582H1 


2759 


2976 


5 


243096.6 


5882569H1 


2760 


3030 


5 


243096.6 


9775350 


2793 


3137 


5 


243096.6 


9705857 


2790 


3138 


5 


243096.6 


92002380 


2803 


3138 


5 


243096.6 


5927949H1 


2845 


3140 


5 


243096.6 


1511493T6 


2883 


3500 
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SEQ ID NO: 


Template ID 


Component ID 


Start 






5 


243096.6 


1335311 HI 


2951 


'^9n^ 




5 


243096.6 


1613745H1 


2965 


\j t / 7 




5 


243096.6 


3472823H1 


3005 






5 


243096.6 


9570224 


3048 


'^'^1 ft 




5 


243096.6 


94095588 


3134 






5 


243096.6 


9831152 


3143 






5 


243096.6 


g4286632 


3205 


'^i47A 




5 


243096.6 


5907720H1 


3208 






5 


243096.6 


94187457 


3286 


'^*=i'^7 




5 


243096.6 


93842315 


3295 


0*400 




5 


243096.6 


94005713 


3303 


0*400 




5 


243096.6 


94006389 


3305 


•^idA^ 
0^00 




5 


243096.6 


94006377 


3305 


oooy 




5 


243096.6 


04006150 




000/ 




5 


243096.6 


04006070 








5 


243096.6 


04187003 




vjOo^ 




5 


243096.6 


a4 1 88554 




oOo/ 


C3 


5 


243096.6 


04006771 




000/ 




5 


243096.6 


04072007 


OO I o 


o04z 




5 


243096.6 




OO I O 






5 


243096.6 




OO 1 O 


o4o5 




5 


243096.6 


04005644 


'^'^1 A 
OO 1 o 


000/ 


5 


243096.6 


5840nfi6H 1 




oOoo 




5 


243096.6 


5289394H1 




0/0/ 




6 . 


243096.6 


g710217 




o/oy 




5 


243096.6 


9694295 


3619 


0/0 1 




5 


243096.6 


□2206232 




0/ V4 


lu 


5 


243096.6 


02206104 




'^70c; 
0/ VO 


5 


243096.6 


28972 15H1 


1 
1 






5 


243096.6 


3541808H1 


Iftl 


'KOI 




5 


243096.6 


2352032H1 




OylO 




5 


243096.6 


2446727F6 


44 


1 n/t 
1 0*4 




5 


243096.6 


3123367H1 




oDO 




5 


243096.6 


4385825H1 


181 


0/7 




5 


243096.6 


2446727H1 


44 


OL/O 




5 


243096.6 


2905666H1 - 


45 


?9A 




5 


243096.6 


27676 16H1 


46 


ouo 




5 


243096.6 


4521271H1 


214 


*♦/ 0 




5 


243096.6 


1725750H1 


47 


9no 




5 


243096.6 


g 1965606 


235 


A91 




5 


243096.6 


3117919H1 


47 


328 




5 


243096.6 


2762827H1 


49 


309 




5 


243096.6 


5395762H1 


245 


510 




5 


243096.6 


5585677H1 


251 


484 




5 


243096.6 


3416289H1 


253 


507 




5 


243096.6 


5407275H1 


257 


511 




6 


243096.6 


5407149H1 


257 


520 




5 


243096.6 


3452689H1 


49 


240 




5 


243096.6 


4819033H1 


292 


515 




5 


243096.6 


483458H] 


50 


302 
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wo 00/75298 
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Template ID 


Component ID 


start 


Stop 




D 


^:^oUvo.o 


1 V4 1 /o3n I 


291 


537 




O 




4oO/4 } H 1 


50 


296 




c 
O 




"7 A A~7 1 "7 

g / 667 1 7 


307 


480 




c 
O 


z4oUVo.o 


22 ! OVU 1 H 1 


51 


147 




0 




2vyUoy3ri 1 


73 


383 




o 




aHJ I / /ODrl 1 


7A 
/O 


378 




0 


z4ouy6.o 


OCCO/tl ALJ1 

ZOOo4 1 6n 1 


81 


363 






243096.6 


o7Uo07k1 


82 


682 




0 


243096.6 


o70o07n 1 


82 


339 




c 
Z> 


243096.6 


3692401 Hi 


82 


285 




c 
O 


243096.6 


23o7055H 1 


83 


336 




O 


243096,6 


164 i2o7n 1 


329 


555 




c 
O 


243096.6 


24836o2H i 


84 


331 




c 
O 


243096.6 


4977750Hi 


89 


382 




5 


243096.6 


g4244257 


343 


817 




c 
O 


243096.6 


O 1 Z. C Z Z. f^I f T 

3 I65660H1 


89 


373 


« .. 

;i' '" 


O 


243096.6 


4043243H1 


89 


406 ' 




5 


243096.6 


3500940H 1 


347 


625 


f 


0 


243096.6 


3580985H 1 


91 


415 


Is J J 


5 


243096.6 


1518953F6 


93 


405 




5 


243096.6 


g672832 


92 


414 


5 


243096.6 


g 574622 


92 


418 




5 


243096.6 


2206923 HI 


355 


624 


5 


243096.6 


2681788H1 


92 


287 


Si) 


0 


24oUyo.O 


g6/2o43 


92 


444 




0 


24oUyo.6 


/yu6oUi< 1 


365 


938 




c 

o 


243096.6 


790680H 1 


365 


584 




c 
O 


243096.6 


3o00449H 1 


386 


701 




n 

O 


243096.6 


1 o 1 oyo3H 1 


92 


280 






O/iQnrjZv Zv 
243096.0 


351 U335H1 


92 


396 




c 
O 


243096.6 


4401 o6/H 1 


393 


653 






24oUyo.6 


1 6242 /6H1 


395 


583 




c 

o 


24o0yo.O 


oUyy469H 1 


92 


415 




O 


O/lQnOA A 

24oUyo.O 


go/o 1 U/ 


93 


484 




c 
O 


o/onoA A 

24ouyo.o 


gc5 / 4y44 


93 


492 




O 




OOn 1 0>t '5LJ 1 

220 1 24oH 1 


/111 
4 1 1 


667 




c 
D 


24ouyo.o 


4yu/o2on2 


97 


377 




O 


z4ouyo.o 


22 1 OOSrUn 1 


421 


665 




c 
O 


24oUyo.6 


1 64 / I UoH 1 


102 


323 




c 
O 


243U90.6 


3337242H1 


105 


332 




O 




oo2ooo/H 1 


42 1 


709 




5 


243096.6 


5165830H1 


113 


391 




5 


243096.6 


1919378R6 


432 


865 




5 


243096.6 


2078775H1 


114 


391 




5 


243096.6 


1919378H1 


432 


700 




5 


243096.6 


1798353H1 


115 


371 




5 


243096.6 


5109893H1 


447 


675 




6 


243096.6 


3581083H1 


116 


378 




5 


243096.6 


2202470H1 


456 


711 




5 


243096.6 


164221 OH 1 


461 


676 



72 






cpo in Mr^- 


lempiaT© lu 




o 






CI 












ft 






\J 












ft 






ft 


OA "^nOA A 




ft 






ft 


Oyl'^nOA A 




ft 

*-> 


0/!OnOA A 




ft 


Oyl'^nOA A 




ft 


O/J'^nOA A 




ft 


OylQHOA A 

^4ouyo.o 




ft 


^4ouyo.o 




O 


-«i4ouyo,6 




O 


243096.6 




■ c 
O 


O/ior^nA A 




O 


.ci4ouyo.6 




O 


243096.6 




c: 
O 


243096.6 




c 
O 


24oUyo.6 


O 


243096. 6 


WW 

0 


5 




ft 


24oUy0.6 




ft 


O/lonOA A 


ft 






ft 






ft 


24ouyo.o 


i 


ft 






ft 






ft 


z:4oUyo.O 




ft 


O/l'^nOA A 




ft 


O/onOA A 




ft 


O^onOA A 




ft 


z^ouyo.o 




ft 






ft 






ft 






ft 






ft 






5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




5 


243096.6 




PCT/USOO/15344 



TABLE 4 



Component ID 


Start 


Stop 


282881 7hn 


122 


399 


1 642206H 1 


461 


676 


goo9o0y 


136 


449 


49326161-11 


463 


618 


£~ ~7 7 OZ O 

9571269 


136 


492 


1 7o3o90n 1 


464 


690 


75621 7H1 


464 


706 


30303 o9H 1 


464 


764 


^ "7*7 r /"v^ 

g67769u 


136 


462 


1 754027H 1 


464 


704 


oo93o62li 1 


139 


449 


Qoooo/V 


139 


482 


026083H1 


509 


692 


2768492H1 


143 


413 


O36506R1 


513 


1092 


5294950H1 


147 


395 


836506H1 


513 


759 


1520256H1 


517 


685 


173031 HI 


157 


390 


5197820H2 


161 


419 


g2240993 


522 


940 


g766746 


161 


395 


3728525H 1 


572 


854 


9677072 


160 


502 


28281 15T6 


574 


950 


g28 16446 


612 


874 


4536339H1 


624 


877 


2670757H1 


629 


868 


4994228H1 


697 


1004 


793596H1 


697 


946 


g2058963 


697 


94] 


156U602H1 


719 


948 


1535660H1 


719 


898 


g2058866 


730 


936 


2316449H1 


764 


1045 


686656H 1 


771 


1028 


372852511 


783 


1432 


3659224 n2 


789 


1073 


6394 1 1 H I 


798 


1050 


5332257 HI 


820 


1063 


20o4254nl 


837 


1 136 


2652025T6 


859 


1424 


961879T6 


859 


1437 


5088178T6 


858 


1465 


1 849724F6 


860 


1441 


1849724H1 


860 


1135 


5395762T1 


884 


1440 


1919378T6 


880 


1452 


2663785H1 


891 


1144 


362501 2H1 


904 


1051 



73- 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


oTOp 


5 


243096.6 


2683022H1 


931 


1 Z 1 Z 


5 


243096.6 


3499439H1 


933 




5 


243096.6 


40O5888H1 


935 


191 1 


5 


243096,6 


1682469T7 


939 




5 


243096.6 


2351530H1 


941 




5 


243096.6 


g2063947 


951 


1 9n7 


5 


243096.6 


1668866H1 


960 


1 AJ \ 


5 


243096.6 


1667633H1 


960 


1 9on 


5 


243096.6 


92358923 


963 


1 OAn 
1 uou 


5 


243096.6 


3085780H 1 


971 


1 riQo 


5 


243096.6 


gS 14507 


161 


44o 


5 


243096.6 


g830479 


161 




5 


243096.6 


g816410 


162 


^1 o 

o I y 


6 


244366.6 


1889554H1 




/ou 


6 


244366.6 


1 8S9554F6 


493 




6 


244366.6 


4298324H1 


1 
J 


OCR 


6 


244366.6 


853003H1 


19 




6 


244366.6 


g2 178494 


517 


y 1^ 


6 


244366.6 


853003R6 






6 


244366.6 


3327565H 1 


55 S 


/o/ 


6 


244366.6 


2263295H1 






6 


244366.6 


5401Q26H1 


604 


AT A 
o 1 0 


6 


244366.6 


1285225H1 


606 

www 




6 


244366.6 


2674162H1 


661 


yu^ 


6 


244366.6 


3101288H1 


295 


ooo 


6 


244366.6 


3295139H1 


815 


1 uoo 


6 


244366.6 


6002940H1 


886 


T i7n 
1 J /\j 


6 


244366.6 


3101288F6 


29v5 




6 


244366.6 


6002740H1 


904 


1 1 in 


6 


244366.6 


3246058F6 


941 


1 OftO 
1 


6 


244366.6 


3246058H1 


941 


1 109 


6 


244366.6 


3887233H1 






6 


244366.6 


2431320H1 


972 


1 109 


6 


244366.6 


1513444H1 


978 


1 1 AO 
1 1 oy 


6 


244366.6 


2813740H1 


1071 




6 


244366.6 


2S15664H1 • 


1071 


1 97^ 


6 


244366.6 


2813707H1 


1071 


1 ouy 


6 


244366.6 


3492628H1 


1 138 


1414 


6 


244366.6 


2183893H1 


1 190 




6 


244366.6 


5641 164H1 


1238 


1 MOW 


6 


244366.6 


5a0082Hl 


1239 


1487 


6 


244366.6 


3155135H1 


1254 


1487 


6 


244366.6 


30754 16H1 


1282 


1565 


6 


244366.6 


3559024H1 


1351 


1639 


6 


244366.6 


3451987H1 


1525 


1785 


6 


244366.6 


4378692H1 


1597 


1817 


6 


244366.6 


g21 62961 


1742 


2237 


6 


244366.6 


3890528H1 


1764 


1919 


6 


244366.6 


5017346H1 


3006 


3272 


6 


244366.6 


1690531 HI 


2938 


3105 
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TABLE 4 





lempiuT© iL-' 


<^.n. p-^n._ 1. .-L -1 - ------ t 1 r~\ 

Vw^omponent id 


Start 


Stop 


A 
\J 


OAA'^/^A A 
id:^*4oOO.O 


OoVUlJoon 1 


3007 


3133 


A 
\J 


O/i/iqAA A 


4o l^ooon 1 


3009 


3145 


A 
\J 


OyM'^AA A 


2/00 1 1 /H I 


2979 


3239 


A 

O 


O/1/l'^AA A 


oUo/oOon 1 


30] 2 


3292 


A 


Oyly! '^AA A 


0327o9H 1 


3013 


3234 


A 
O 




5262767H2 


3029 


3291 


A 
O 


O/lylQAA A 


2626903 Hi 


3031 


3283 


A 
O 


Oyl>11AA A 


1403785H j 


3034 


3328 


A 
O 




332i234Hl 


2980 


3254 


O 




CXj3727H 1 


3043 


3400 


A 
O 




0031 76H1 


3043 


3543 


O 




003684H1 


3043 


3424 


A 
O 


O >1 /I O XX X 


0O3185H1 


3043 


3550 


O 


244o66.o 


0031 82H1 


3043 


3493 


A 
O 


z44ooo.o 


003701 HI 


3043 


3471 


O 


244366.6 


0036 15H1 


3043 


3429 


A 
O 


244O00.6 


0031 88H1 


3043 


3483 


A 


244366.6 


0031 27H1 


3043 


3453 


A 
O 


244366.6 


003465H] 


3043 


3433 


A 
O 


O yi yt O X X X 

244366.6 


003422H 1 


3043 


3380 


A 
O 


O y1 /I O XX X 


52906 17H1 


3002 


3300 


A 
O 


oviyi'^Ax X 
244O00.O 


003521 HI 


3043 


3549 


A 
O 


244366.6 


003294H1 


3043 


3411 


A 
O 


O y1 >! O XX X 

244000.6 


0O3642H1 


3043 


3400 


A 
O 




0Q3646H I 


3043 


3405 


A 
O 


O /I ylQ AA X 


0(~\OZ. Z./^l 11 

UO3660H1 


3043 


3392 


A 
O 


O >1 y1 Q XX X 

2-44000.6 


51 38512H1 


3079 


3394 


A 
O 


244O00.O 


v( z. /~\ £r 1 11 

094605H 1 


3081 


3258 


A 
O 


O/lylQAA A 

2^4oOO.o 


1 726602H1 


31 14 


3335 


A 
O 


2443O0.O 


1 XOO yf"7>1TZ 

1 6234 74T6 


3120 


3724 


A 
O 


Oyl/IQAX A 

2^4oOO.O 


238 1350H 1 


3122 


3380 


A 
O 




768275H 1 


3130 


3388 


A 
O 


O/l/l 0 AA A 

<i44oO0.O 


049525 1 HI 


3143 


3350 


A 


Oy1>1'5AA A 


3/UO/42H1 


3161 


3533 


A 
O 


^:xJ4oOO.O 


4/44563H1 


3175 


3471 


A 


OA /I '3 AA A 
X'44oOO.O 


o 1 Ul 2oo l6 


3197 


3715 


A 


Oy1/'5AA A 
x*l4oOO.O 


ODOloo2H 1 


3201 


3503 


A 
U 




U/4/04H 1 


3202 


3442 


A 

O 




U/0029H 1 


3202 


3403 


A 




iy/o44]T6 


3222 


3706 






O**/ / / 44£\}r\ \ 




3416 


6 


244366.6 


2112529T6 


3250 


3723 


6 


244366.6 


93933445 


3274 


3756 


6 


244366.6 


4861989H1 


3274 


3567 


6 


244366.6 


1737024F6 


3292 


3734 


6 


244366.6 


92584374 


3285 


3757 


6 


244366.6 


1735490H1 


3292 


3561 


6 


244366.6 


1737024H1 


3292 


3554 


6 


244366.6 


393035H1 


3303 


3590 


6 


244366.6 


21 58031 F6 


3307 


3758 
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TABLE 4 



irj 

£3 



SEQ ID NO: 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 
6 



Template !D 


Component ID 


Stnrt 


OTOp 


244366.6 


g4438605 


3308 


o/OO 


244366.6 


2554055H1 






244366.6 


g2934389 


3326 


0/00 


244366.6 


g4391414 


3341 


v5/DO 


244366.6 


4460645H 1 


3349 


•5 Am 


244366.6 


g2208607 




0/00 


244366.6 


a23 18342 




'5*7 £;a 
0/00 


244366.6 


a 1 062585 


'^37n 


0/4o 


244366.6 




00/ ^ 


0/61 


244366.6 


01925212 


000 1 


0/0/ 


244366.6 


4726758H 1 




OOOZ 


244366.6 


02410378 




0/00 


244366.6 


8674 19H1 


oovo 


00/9 


244366.6 


0616070 


0*4 


0/64 


244366.6 


02163418 




o75o 


244366.6 


02555602 


0*4 1 0 


o/60 


244366.6 


0561365 




o756 


244366.6 


1 889554T6 




OTIC 

0/25 


244366.6 


02336915 


'XAAA 


3756 


244366.6 


06I6I IS 




0/06 


244366.6 


04435130 




3756 


244366.6 


04525507 


OCaJZ 


v5/0/ 


244366.6 


21 58031 HI 




3756 


244366,6 


0240 1 624 




0/00 


244366.6 


04268526 




0/00 


244366.6 


20093 70H1 


0000 


0/00 


244366.6 


218073H1 


000 .<i 


'37c:a 
0/00 


244366.6 


2350763H 1 




o/oU 


244366.6 


1647483H1 


9^90 


z/ov 


244366.6 


2432662H 1 


9S9A 


07c: "7 

z/o/ 


244366.6 


901 941 Rl 




onoA 
OUVO 


244366.6 


901941H1 


9^'^A 
zooo 


zoyo 


244366.6 


901981H1 




zoOo 


244366.6 


2Q52263H 1 


2S4S 


zoo/ 


244366.6 


3483573H 1 




zoo 1 


244366.6 


3565807H1 




0000 

zozz 


244366.6 


g2278841 


2560 


00 1 7 
ZV I / 


244366.6 


g2 178439 


9%3 


zV 1 / 


244366.6 


92153824 




00 1 7 


244366.6 


Q1329145 


256A 


ZO/4 


244366.6 


219851 5H1 


2647 


2908 


244366.6 


2200581 HI 


2647 


2725 


244366.6 


g 1548506 


2680 


3207 


244366.6 


232n85Hl 


2694 


2917 


244366.6 


324407 1T6 


2568 


2798 


244366.6 


2936492H1 


2700 


2917 


244366.6 


600642H1 


2586 


2891 


244366.6 


g 11 24072 


2711 


2850 


244366.6 


g 1833465 


2712 


2856 


244366.6 


94327019 


2736 


2851 
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T 







TABLE 4 






SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


6 


244366.6 


3165778H1 


2588 


2925 


6 


244366.6 


92139164 


2594 


2835 


6 


244366.6 


633565H1 


2753 


2917 


6 


244366.6 


1520269F6 


2766 


3147 


6 


244366.6 


1520026H1 


2766 


2917 


6 


244366.6 


1520269H1 


2766 


2917 


6 


244366.6 


94095144 


2611 


2917 


6 


244366.6 


1237708H1 


2767 


3016 


6 


244366.6 


276841 IHl 


2770 


3013 


6 


244366.6 


91321416 


2615 


2858 


6 


244366.6 


2416809H1 


2800 


2917 


6 


244366.6 


1570846H1 


2811 


3018 


6 


244366.6 


389811 4H1 


2621 


2859 


6 


244366.6 


874422H1 


2824 


3131 


6 


244366.6 


91349372 


2825 


2952 


6 


244366.6 


2815235H1 


2842 


3116 


6 


244366.6 


4897079H1 


2925 


3201 


6 


244366.6 


1231478H1 


2632 


2861 


6 


244366.6 


2152887H1 


2927 


3042 


6 


244366.6 


3510024H1 


2635 


2917 


6 


244366.6 


1 97544 1F6 


2938 


3306 


6 


244366.6 


2189605H1 


2938 


3203 


6 


244366.6 


1975441H1 


2938 


3088 


6 


244366.6 


3470739H1 


2938 


3153 


6 


244366.6 


91844965 


2643 


2917 


6 


244366.6 


1623474H1 


2155 


2382 


6 


244366.6 


1338343H1 


1799 


2055 


6 


244366.6 


2022520H1 


2157 


2423 


6 


244366.6 


805131H1 


2200 


2397 


6 


244366.6 


795024H1 


2202 


2393 


6 


244366.6 


1 338343F6 


1799 


2241 


6 


244366.6 


1297158H1 


1810 


2050 


6 


244366.6 


3354386H1 


2223 


2491 


6 


244366,6 


2540345H1 


2224 


2461 


6 


244366.6 


2313137H] 


1887 


2152 


6 


244366.6 


2805024H1 


2233 


2536 


6 


244366.6 


245458 1T6 


2281 


2827 


6 


244366.6 


9570404 


1961 


2245 


6 


244366.6 


2773281 HI 


2282 


2528 


6 


244366.6 


3246058T6 


2284 


2807 


o 


244366.6 


3332425T6 


2286 


2817 


6 


244366.6 


3321733H1 


1988 


2109 


A 

o 


^?44366.6 


g2 153937 


2324 


2754 


6 


244366.6 


1464642H1 


1995 


2224 


6 


244366.6 


g1319564 


2331 


2938 


6 


244366.6 


3555988H1 


2005 


2304 


6 


244366.6 


338403OH1 


2043 


2317 


6 


244366.6 


g 1898453 


2332 


2760 


6 


244366.6 


g 1062443 


2337 


2746 


6 


244366.6 


3188982H1 


2348 


2686 
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wo 00/75298 





SEQ ID NO; 


Tempiate ID 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366,6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 




6 


244366.6 


h 


6 


244366,6 




6 


244366.6 


B 


7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 


1 


7 


405313.4 


7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313,4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313.4 




7 


405313,4 




7 


405313.4 




7 


406313.4 



PCT/USOO/15344 



TABLE 4 



Comoonent ID Start stop 
2255759H1 2045 2318 

2112529H1 2355 2621 

3693731 HI 2364 2662 

1864803H1 2069 2351 

8530O3T6 2385 2815 

gl925211 2137 2615 

2641982H1 2385 2598 

2516658H1 2397 2535 

1864803T6 2414 2898 

1338343T6 2421 2814 

1398258H1 2440 2681 

1829874H1 2471 2738 

589137H1 2478 2685 

4855638H1 2147 2411 

1698592H1 2500 2726 

3524562H1 2154 2348 

gl 3 19504 2523 2952 

1623474F6 2155 2682 

4640462H1 573 837 

g 1774849 595 979 

4721077H1 54 194 

5944975H1 61 370 

1948647H1 596 828 

1 59201 6H1 86 282 

1948647R6 596 1136 

g4070751 686 1137 

2384959H1 86 263 

4571373H1 715 978 

g954058 893 1203 

1559555H1 903 1120 

1559555F6 903 1363 

9617633 914 1316 

1302977H1 86 257 

4215272H1 919 1195 

g 1492868 88 230 

4643815H1 962 1212 

965308H1 968 1255 

965308R1 968 1622 

1321926T6 132 483 

5136028H1 993 1266 

94264253 155 609 

93739298 156 610 

43061 78H1 1008 1207 

91237752 1009 1175 

4551446H1 1047 1310 

94522664 210 522 

1628853H1 1049 1219 

1627193H1 1049 1261 

1316291H1 349 522 

1628853F6 1049 1649 



78 
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m 
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SEQ ID NO: 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 

. 7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 



Template ID 
405313,4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313,4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 



TABLE 4 

Component ID 
1283759H1 
4312209H1 
4103086H1 
91984348 
92540589 
g 1492809 
3584223H1 
102669H1 
4829437H1 
3674811 HI 
1733325H1 
g 1984560 
92010449 
2682711 HI 
94535531 
91773873 
94137010 
41 11488H1 
2153069H1 
870657R1 
870657H1 
5433876H1 
93162264 
93057393 
93872586 
2081907T6 
659926H1 
056422H1 
3486469H1 
817086R1 
817086H1 
817086T1 
93597649 
3988965H1 
2664989H1 
3962192H1 
2290557H1 
4n5475Hl 
9670108 
9570685 
2213032H1 
1667188H1 
2285586H1 
990792H1 
1283707T6 
3781966H1 
2402093H1 
3666248H1 
1948647T6 
5681549H1 



Start 
1053 
1067 

ni5 

1147 
1168 
369 
1177 
458 
540 
1186 
1192 
561 
2002 
2014 
2016 
2022 
2025 
2026 
2034 
2041 
2041 
2063 
2073 
2080 

2081 

2084 

2097 

2097 

2105 

2127 

2127 

2127 

2142 

2205 

1633 

1646 

1645 

1647 

1649 

1650 

1653 

1653 

1653 

1669 

1682 

1686 

1700 

1707 
1711 
1746 



Stop 
1330 
1367 
1239 
1472 
1616 
527 
1354 
607 
737 
1495 
1412 
804 
2335 
2309 
2337 
2340 
2346 
2288 
2315 
2613 
2250 . 
2317 
2338 
2337 
2335 
2288 
2337 
2317 
2337 
2337 
2388 
2280 
2335 
2499 
1850 
1776 
1921 
1861 
1960 
1964 
1919 
1775 
1841 
1968 
2306 
2020 
1946 
1866 
2289 
2012 



79 
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SEQ ID NO: 
7 
7 
7 
7 ■ 
7 
7 
7 
7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 

7 



Tempiate ID 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
4053134 
4053134 
4053134 
405313.4 
405313.4 
405313.4 
4053134 
405313.4 
405313.4 
405313.4 



TABLE 4 

Component ID 
2886455T6 
1301039H1 
1669180T6 
1669180H1 

3436329H1 
g 1624740 
1559555T6 

3106278H1 

2323323T6 

1914641H1 
1628853T6 
g2946487 

2081907F6 

2081917H1 
9672957 

2916179H1 

3556793T6 

92783325 

12681 87F1 

1268187H1 

1268187F6 

1268187T6 
g616527 

g3593850 

9573001 

9815353 

94083770 

94281927 

92753877 
219921 IHl 
2375908H1 

93797979 
26533 12H1 

94265408 

93919084 

94085496 

g668546 
2149388H1 
2601556H1 
92789326 

g646138 

9888680 

g646137 

9645108 

9917579 
93051580 
93764150 
92903435 
91225232 
91087854 



Start 

1752 

1790 

1799 

1808 

1815 

1828 

1831 

1846 

1848 

1848 

1854 

1855 

1854 

1854 

1859 

1878 

1881 

1885 

1888 

1888 

1888 

1890 

1892 

1896 

1897 

1898 

1902 

1912 

1930 

1930 

1934 

1954 

1963 

1967 

1969 

1970 

1976 

1985 

1995 

2898 

2913 

2916 

2924 

2927 

2933 

2943 

2943 

2947 

2947 

2947 



Stop 

2292 

2095 

2303 

2038 

1956 

2218 

2302 

2121 

2308 

2062 

2300 

2335 

2313 

2015 

2199 

2177 

2306 

2345 

2344 

2152 

2203 

2317 

2244 

2335 

2264 

2253 

2335 

2337 

2345 

2188 

2181 

2337 

2221 

2342 

2335 

2334 

2156 

2274 

2280 

3179 

3179 

3206 

3179 

3179 

3178 

3179 

3179 

3179 

3179 

3111 



79 bis 



wo 00/75298 




TABLE 4 







Component ID 


oTart 


Stop 


7 

/ 




Q I o4/o 14 


2998 


3213 


7 

/ 




i^Ol 7AQA 


3058 


3210 


7 
/ 




Iz^0400i 1 


3087 


3168 


7 
/ 




1 OOAylCQU 1 

I z.<i04oon 1 


3087 


3167 


7 


^UDO 1 


go 1 /4481 


3089 


3178 


7 




g64o728 


2878 


3179 


7 


AUOo 1 v5.4 


goo4ooo 


2884 


321 1 


7 


AUOo 1 0.4 


go 1 5354 


2895 


3216 


7 


4uO0 1 0.4 


071 /17/vnLJ T 


2231 


2345 


7 


4UDo 1 0.4 


14409 I4H] 


2252 


2421 


7 


4U00IO.4 


|44uyl4F6 


2252 


2676 


7 




431 1940H1 


2263 


2546 


7 


4UOO 10.4 


607oU69H 1 


2266 


2498 


7 


4UOO 1 o,4 


O49605H 1 


2285 


2554 


7 


4Ut)o lo.4 


2l572o5H 1 


2287 


2523 


7 


4U0o I0.4 


2323 1 79H 1 


2324 


2460 


7 


4(JC)0 lo.4 


2323 1 OoH I 


2324 


2581 


7 


4UOo 1 0.4 


g2l 97270 


2337 


2696 


7 


4U00 io.4 


2294826H 1 


2444 


2517 


7 


4UOO I o .4 


g 1 043992 


2444 


2683 


7 


4U0o 1 0.4 


38601 lUHl 


2444 


2682 


7 


4U0O lo.4 


3246952H 1 


2472 


2725 


7 


4UDO 1 0 M 


1 f^CZ. T "7 0LJ 1 

19o6178Hl 


2480 


2773 


7 


4U0O 1 0.4 


4o IzoooHl 


2495 


2794 


7 


4UDo lo.4 


g/ 70052 


2623 


2930 


7 


^UOO 1 0.4 


1/11 R'JQAU 1 

14 lO/o4n 1 


2660 


2922 


7 


4U0O 1 0.4 


goo4ooo 


2666 


3025 


7 


4U0O 1 0.4 


20141 71 HI 


2668 


2943 


7 


4U0o 1 0.4 


Qooo679 


2667 


3021 


7 


4U0O 1 0.4 


2224/91 H 1 


2686 


2955 


7 


4U0o 1 v3.4 


/I 1 HAD 1 KLJ 1 

4 1 UOV 1 On ( 


2695 


2996 


7 


ytn'^'^1 


*-»OOT ~77Qr> 


2772 


3181 


7 


*4UOO lO.*4 


/^0fl7/107X^ 

gzo/4z/o 


2772 


3179 


7 




1 A4Uy I aK 1 


2773 


3179 


7 


/inc. '5 1 y1 


yl n~7 c: y! o >i 
g4U/0424 


2775 


3179 


7 




^yiqoQOOA 

y4ozooyo 


2//O 


3179 


7 


idn*^*^ 1 Q y| 


ZO/ /4ori 1 


07*70 

2778 


3143 


7 


^UOO 1 0 .4 


o4yoyoun 1 


2802 


3087 


7 




z/OUoo2n ( 


2806 


3105 


7 


4UOo 1 u.*4 


i^7Af;77 yl 


2820 


3182 


7 






Oft "51 


31 79 


7 


405313,4 


g4450984 


2833 


3179 


7 


405313.4 


92099917 


2853 


3348 


7 


405313.4 


g2018248 


2860 


2990 


7 


405313.4 


g564562 


2869 


3179 


7 


405313.4 


92459191 


2869 


3206 


7 


405313.4 


g 1099005 


2521 


2798 


7 


405313.4 


273n46Hl 


2559 


2794 


7 


405313.4 


gl 198836 


2578 


2840 


7 


405313.4 


2229784H1 


2603 


2852 



80 



wo 00/75298 



PCTAJSOO/15344 



SEQ JD NO: 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
. 7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
7 
8 
8 
8 
8 
8 
8 
8 
8 
8 
8 



Template ID 
405313.4 
405313.4 
405313.4 
405313.4 
405313,4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
405313.4 
436857.2 
436857.2 
436857,2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 



TABLE 4 

Component ID 
2229548H1 
1321926H1 
1321926F6 
1337104H1 
26821 13H1 
g2754255 
4541436H1 
92558152 
92540638 
1282095H1 
1283707H1 
1282048H1 
1283707F6 
3280293H1 
5398765H1 
507211 5H1 
5219645H1 
1893001 HI 
3696826H1 
59201 92H1 
93770003 
5665259H1 
3151495H1 
3357256H2 
692886H1 
5400503H1 
2323323H1 
2323323R6 
462231 2H1 
5373496H1 
624120H1 
2535460H1 
2115405H1 
9954059 
2292561 HI 
2556287H1 
9866975 
8619nRl 
86191 IHl 
9873285 
232864F1 
2704880T6 
94373224 
94112872 
94390509 
6858881 HI 
5267222H1 
1477850T6 
4617960T6 
9917596 



Start 
2603 

1 

1 

8 
8 
1191 
1206 
1209 
1218 
1225 
1225 
1225 
1226 
1234 
1241 
1293 
1319 
1317 
1319 
1319 
1322 
1327 
1326 
1345 
1353 
1388 
1395 
1395 
1404 
1438 
1456 
1469 
1505 
1516 
1538 
1580 
1610 
1619 
1619 
1626 
1467 
1506 
1522 
1522 
1556 
1624 
1653 
1680 
1761 
1901 



Slop 

2858 

235 

388 

265 

281 

1654 

1470 

1673 

1616 

1484 

1511 

1506 

1659 

151 1 

1391 

1556 

1443 

1616 

1614 

1647 

1609 

1555 

1625 

1500 

1604 

1623 

1657 

1895 

1718 

1700 

1728 

1750 

1801 

1748 

1790 

1842 

1953 

2200 

1876 

2006 

1959 

1925 

1968 

1954 

1962 

1888 

1883 

2225 

2215 

2223 



81 



wo 00/75298 




PCT/US00/I5344 



SEQ ID NO: 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 



Template ID 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 
436857.2 

436857.2 

436857.2 

436857,2 

436857.2 

436857.2 

436857.2 

436857.2 

436857,2 

436857.2 



TABLE 4 

Component ID start 

32701 04H1 1906 

5782044H1 2003 

94268407 2010 

93051680 2063 

48 1468H1 2063 

5683178H1 ] 

1477850H1 56 

1477850F6 56 

4619212H1 138 

g 1978924 207 

3758509H1 206 

4617960H1 323 

4718961 HI 323 

4617960F6 323 

1992924H1 366 

94269060 528 

4255690H1 599 

4761770H1 727 

4613106H1 795 

92000739 795 

270488OH1 883 

270488OF6 883 

2707669H1 961 

805609H1 1054 

805609R1 1054 

4135963H1 1113 

4294249H1 1152 

5450335H1 1156 

5373082H1 1203 

4190554H1 1247 



Stop 
2161 
2209 
2240 
2241 
2250 
212 
278 
488 
290 
556 
498 
550 
555 
761 
651 
627 
830 

1004 

1034 

1025 

1176 

1313 

1264 

1283 

1630 

1415 

1398 

1421 

1419 

1412 



82 



